A batch data pipeline that extracts, transforms and loads the Airbnb Amsterdam into a Data Warehouse in the Google Cloud Platform (GCP) and models the data in Data build tool (dbt).
Inside Airbnb is a mission driven project that provides data and advocacy about Airbnb's impact on residential communities.
We work towards a vision where data and information empower communities to understand, decide and control the role of renting residential homes to tourists.
This project has the goal of providing clarity in the following areas:
-
Apartment availability over time.
-
The range of booking prices.
-
Type of structure available for booking.
-
Airflow:
- Data Ingestion: fetches data from the airbnb data site (Extract), process and arrange the columns (Transform), and loads the data into Google Cloud Storage (Load), then to BigQuery
-
Dbt:
- Data Modelling: Used Dbt to model the data and transformations as well as model deployment
-
Terraform for managing and provisioning infrastructure (GCS bucket, Data Warehouse) in GCP.
-
Docker for encapsulating the dataflows and their dependencies into containers, making it easier to deploy them.
-
Airflow for dataflow implementation and workflow orchestration.
-
Data build tool (dbt) for transforming, partitioning and clustering the dataset in the data warehouse.
-
Google Lookerstudio for creating a dashboard to visualize the dataset.