This is
Event planning organizations face challenges when planning outdoor events due to unpredictable weather conditions. By leveraging weather data, these organizations can make informed decisions, optimize event schedules, and ensure better customer experiences. This project aims to create an ETL (Extract, Transform, Load) pipeline to collect, process, and visualize weather data to assist event planners in making data-driven decisions
- Collect weather data from an external API.
- Transform and clean the data for analysis.
- Load the processed data into a PostgreSQL database.
- Create visualizations to provide insights into weather patterns.
- Schedule and automate the entire ETL process using Apache Airflow.
This project focuses on building a robust and automated ETL pipeline using Python, Docker, PostgreSQL, and Apache Airflow, and visualizing the processed data using Matplotlib.
Ensure you have the following installed:
- Docker (including Docker Compose)
- Python 3.7+
- Airflow (included Here)
git clone https://github.com/yourusername/weather-etl-pipeline.git
cd weather-etl-pipeline
pip install -r requirements.txt
Create a .env file in the root directory and add the necessary environment variables:
API_KEY=your_api_key
SMS_API_KEY=your_sms_api_key
SMS_RECIPIENT=recipient_number
POSTGRES_USER=your_postgres_user
POSTGRES_PASSWORD=your_postgres_password
POSTGRES_DB=your_database_name
docker-compose up -d
airflow db init
airflow users create \
--username admin \
--firstname FIRST_NAME \
--lastname LAST_NAME \
--role Admin \
--email [email protected]
In separate terminals, run the following commands:
airflow scheduler
airflow webserver -p 8080
- Copy your DAG script (weather_data_pipeline_dag.py) to the Airflow DAGs folder (~/airflow/dags).
- Access the Airflow web interface at http://localhost:8080.
- Toggle the switch to activate your DAG.
- Trigger the DAG manually or wait for the scheduled run.
- Data Ingestion : Collects weather data from an external API and stores it temporarily
- Data Transformation : Cleans and processes the raw data for analysis.
- Data Loading : Loads the transformed data into a PostgreSQL database.
- Visualization : Uses Matplotlib to create visualizations of weather patterns.
This visualization shows the processed weather data, providing insights into weather patterns that can help event planners make informed decisions.
- Deploy the pipeline on a cloud platform for better scalability and availability.
- Integrate real-time data processing capabilities.
- Enhance visualizations using advanced BI tools like Power BI or Tableau.
This project demonstrates how to build an automated ETL pipeline using modern data engineering tools and technologies. The pipeline efficiently collects, processes, and visualizes weather data to aid event planning organizations in making informed decisions.