Skip to content

A data pipeline that extracts, transforms and loads the Airbnb Amsterdam data from Airbnb data site to BigQuery using Airflow.

Notifications You must be signed in to change notification settings

0xhaisenberg/airbnb-amsterdam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airbnb Data Engineering Project

A batch data pipeline that extracts, transforms and loads the Airbnb Amsterdam into a Data Warehouse in the Google Cloud Platform (GCP) and models the data in Data build tool (dbt).

Project Description

Inside Airbnb is a mission driven project that provides data and advocacy about Airbnb's impact on residential communities.

We work towards a vision where data and information empower communities to understand, decide and control the role of renting residential homes to tourists.

This project has the goal of providing clarity in the following areas:

  1. Apartment availability over time.

  2. The range of booking prices.

  3. Type of structure available for booking.

Project architecture

How the data pipeline works

  • Airflow:

    1. Data Ingestion: fetches data from the airbnb data site (Extract), process and arrange the columns (Transform), and loads the data into Google Cloud Storage (Load), then to BigQuery
  • Dbt:

    1. Data Modelling: Used Dbt to model the data and transformations as well as model deployment

Technologies

  • Terraform for managing and provisioning infrastructure (GCS bucket, Data Warehouse) in GCP.

  • Docker for encapsulating the dataflows and their dependencies into containers, making it easier to deploy them.

  • Airflow for dataflow implementation and workflow orchestration.

  • Data build tool (dbt) for transforming, partitioning and clustering the dataset in the data warehouse.

  • Google Lookerstudio for creating a dashboard to visualize the dataset.

Results

The dashboard is publicly available in this link.

Reproduce the project

Follow detailed guide here

About

A data pipeline that extracts, transforms and loads the Airbnb Amsterdam data from Airbnb data site to BigQuery using Airflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published