In this repository, there are 2 folders: AmazonDataExtraction and SpotifyETL
In this project, the goal was to extract data from the amazon website using BeautifulSoup library.
- The rough basic work is shown in:
- After aggregating all the logic from the basic rough work, I have defined all the functions and extracted data from the website in:
- The extra file "amazon_etl.py" is a folder created to define all the functions in a python file and can be later form a pipeline using AirFlow.
-
Loading the extracted data to S3 storage using Airflow.
-
Then, Modeling the data into star schema and finally loading into Redshift for further Analytical Work
In this project, the goal was to extract data from a playlist of spotify using Spotify API (Spotipy Library)
- Build a basic file to explore and extract the data:
- Build a python file that connects us to the spotify api:
- Build a python file to define the proper functions of extracting with the help of the above rough work file:
-
Now, Create an EC2 instance in AWS Console.
-
After Connecting to the instance, Install all the dependencies needed for the server:
- Finally, create a DAG file to be used in Airflow:
Then, Perform the functions through Airflow and the data will be loaded in S3 Storage.
- Modeling the data into star schema and finally loading into Redshift for further Analytical Work.
My name is WAREPAM RICHARD SINGH. In this Project, I have learned:
- Data Extraction
- Data Modelling: (draw.io/ Lucid)
- Data Transformation
- Data Loading
- AWS Services: S3, Apache Airflow, Redshift
For more project Updates, You can find me on: