ETL/Data Pipelining Project Using AWSservice

In this repository, there are 2 folders: AmazonDataExtraction and SpotifyETL

AmazonDataExtraction:

In this project, the goal was to extract data from the amazon website using BeautifulSoup library.

The rough basic work is shown in:

Click Here

After aggregating all the logic from the basic rough work, I have defined all the functions and extracted data from the website in:

Click Here

The extra file "amazon_etl.py" is a folder created to define all the functions in a python file and can be later form a pipeline using AirFlow.

Possible Future Improvement in this project are:

Loading the extracted data to S3 storage using Airflow.
Then, Modeling the data into star schema and finally loading into Redshift for further Analytical Work

SpotifyETL:

In this project, the goal was to extract data from a playlist of spotify using Spotify API (Spotipy Library)

Steps of the Project:

Build a basic file to explore and extract the data:

Click Here

Build a python file that connects us to the spotify api:

Click Here

Build a python file to define the proper functions of extracting with the help of the above rough work file:

Click Here

Now, Create an EC2 instance in AWS Console.
After Connecting to the instance, Install all the dependencies needed for the server:

Click Here

Finally, create a DAG file to be used in Airflow:

Click Here

Then, Perform the functions through Airflow and the data will be loaded in S3 Storage.

Possible Future Improvement in this project are:

Modeling the data into star schema and finally loading into Redshift for further Analytical Work.

About the Contributer:

My name is WAREPAM RICHARD SINGH. In this Project, I have learned:

Data Extraction
Data Modelling: (draw.io/ Lucid)
Data Transformation
Data Loading
AWS Services: S3, Apache Airflow, Redshift

My Social Media Links

For more project Updates, You can find me on:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
AmazonDataExtraction		AmazonDataExtraction
SpotifyETL		SpotifyETL
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL/Data Pipelining Project Using AWSservice

AmazonDataExtraction:

Possible Future Improvement in this project are:

SpotifyETL:

Steps of the Project:

Possible Future Improvement in this project are:

About the Contributer:

My Social Media Links

About

Releases

Packages

Languages

richardwarepam16/ETL-Data_Pipelining_Project_using_AWSservice

Folders and files

Latest commit

History

Repository files navigation

ETL/Data Pipelining Project Using AWSservice

AmazonDataExtraction:

Possible Future Improvement in this project are:

SpotifyETL:

Steps of the Project:

Possible Future Improvement in this project are:

About the Contributer:

My Social Media Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages