Welcome to my Data Engineering Zoomcamp repository, where I'm documenting my adventure through the course conducted by DataTalksClub. Here, you'll find my solutions to the assignments and projects from various weeks of the Data Engineering Zoomcamp 2024 Cohort.
- Running Postgres Locally with Docker
- Ingesting Data to Postgres
- Putting the Ingestion Script into Docker
- Running Postgres and pgAdmin with Docker-Compose
- Setting Up Infrastructure on GCP with Terraform
- Configuring Mage - ETL: API to Postgres
- ETL: API to GCS
- ETL: GCS to BigQuery
- Parameterized Execution
- Deployment with Terraform and GoogleCloud
- Data ingestion from APIs to warehouse using dltHub
- Making and running queries on External Tables in BigQuery
- Digging into table partitioning and clustering in BigQuery
- Understanding the variance in the volume of data read when running identical queries on both BigQuery internal and external tables
- Running ML models in BigQuery
- Getting started with dbt labs
- Understanding data modeling
- Connecting dbt with BigQuery
- Testing and explaining dbt models
- Deploying with BigQuery + dbt cloud
- scheduling tasks (dbt labs)
- managing data sourcesdbt (dbt labs)
- hosting documentation (dbt labs)
- Spark Dataframes.
- SQL with Spark
- Anatomy of a Spark Cluster
- GroupBy in Spark
- Joins in Spark
- Creating a Local Spark Cluster
- Setting up a Dataproc Cluster
- Connecting Spark to BigQuery
- Stream processing
- Kafka producer consumer
- Kafka configuration
- Kafka stream join
- Kafka stream testing
- Kafka stream windowing
- Kafka ksqldb & Connect
- Kafka Schema registry
- Stateless computation (Filters, Projections)
- Stateful Computation (Aggregations, Joins)
The successful completion and validation of the final project demonstrated the practical application of data engineering principles and tools. This validation involved a thorough review and assessment by the Zoomcamp instructors, ensuring that all aspects of the project met the high standards set by the course.