Skip to content

d4mp3/DamDeZoomcamp

Repository files navigation

🚀 Data Engineering Zoomcamp Journey

Welcome to my Data Engineering Zoomcamp repository, where I'm documenting my adventure through the course conducted by DataTalksClub. Here, you'll find my solutions to the assignments and projects from various weeks of the Data Engineering Zoomcamp 2024 Cohort.

📚 Topics Explored:

📘 Module 1:

Containerization and Infrastructure as Code

  • Running Postgres Locally with Docker
  • Ingesting Data to Postgres
  • Putting the Ingestion Script into Docker
  • Running Postgres and pgAdmin with Docker-Compose
  • Setting Up Infrastructure on GCP with Terraform

Homework

📙 Module 2:

Workflow Orchestration

  • Configuring Mage - ETL: API to Postgres
  • ETL: API to GCS
  • ETL: GCS to BigQuery
  • Parameterized Execution
  • Deployment with Terraform and GoogleCloud

Homework

📕 Module 3:

Data Warehouse and BigQuery / dlt Workshop

  • Data ingestion from APIs to warehouse using dltHub
  • Making and running queries on External Tables in BigQuery
  • Digging into table partitioning and clustering in BigQuery
  • Understanding the variance in the volume of data read when running identical queries on both BigQuery internal and external tables
  • Running ML models in BigQuery

Homework data warehouse

Homework dlt

📗 Module 4:

Analytics Engineering

  • Getting started with dbt labs
  • Understanding data modeling
  • Connecting dbt with BigQuery
  • Testing and explaining dbt models
  • Deploying with BigQuery + dbt cloud
  • scheduling tasks (dbt labs)
  • managing data sourcesdbt (dbt labs)
  • hosting documentation (dbt labs)

Homework

📔 Module 5:

Batch processing with Spark

  • Spark Dataframes.
  • SQL with Spark
  • Anatomy of a Spark Cluster
  • GroupBy in Spark
  • Joins in Spark
  • Creating a Local Spark Cluster
  • Setting up a Dataproc Cluster
  • Connecting Spark to BigQuery

Homework

📒 Module 6:

Stream Processing / RisingWave Workshop

  • Stream processing
  • Kafka producer consumer
  • Kafka configuration
  • Kafka stream join
  • Kafka stream testing
  • Kafka stream windowing
  • Kafka ksqldb & Connect
  • Kafka Schema registry
  • Stateless computation (Filters, Projections)
  • Stateful Computation (Aggregations, Joins)

Homework

Workshop

🏆 The Project:

Final project

The successful completion and validation of the final project demonstrated the practical application of data engineering principles and tools. This validation involved a thorough review and assessment by the Zoomcamp instructors, ensuring that all aspects of the project met the high standards set by the course.

Final Project Repository

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published