Coding Challenge

Requirements

Developed using Python 3. Working AWS credentials should be provided in the system The AWS account would need permissions to read/write to S3 and start Athena jobs/queries (athena:StartQueryExecution) e.g. for mac, in ~/.aws/credentials

Install required python modules with

pip3 install -r requirements.txt

Files included

helper_modules - Modules to help process the data
process_sensor_data - Main module to start the script
tests - Unit tests
requirements.txt - List of required python modules

Usage

The two data files have been manually downloaded and added to S3 s3://sid-coding-test-data/rawdata/ for easier access Given more time, it is possible to automate that as well

What the script does

Downloads the 2 data files from S3
Load the data files in to pandas dataframes
Process the dataframes to extract the required Top 10 sensor locations by day and month by pedestrian count and writes them to local files topn_by_day_loc.csv and topn_by_month_loc.csv
Write the original data to S3 in parquet format for future querying. The original downloaded data was ~ 370MB, while the parquet files written to S3 are about 70MB
Create external tables in AWS Athena to enable querying of the data - ped_loc_data and sensor_locations. The Athena queries take about 17 mins to complete, after which the data can be queried through Athena. These external tables is Athena refer to the data in S3 written in Step 4 above.

python3 process_sensor_data.py

# Testing
pytest tests.py

A very basic Tableau dashboard is published which is connected to this data in AWS Athena. https://public.tableau.com/app/profile/siddharth.bose/viz/TopNpedSensorsMelb/TopNSensorsbypedestriancount

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
test_data		test_data
.gitignore		.gitignore
Readme.md		Readme.md
helper_modules.py		helper_modules.py
process_sensor_data.py		process_sensor_data.py
requirements.txt		requirements.txt
tableau.png		tableau.png
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coding Challenge

Requirements

Files included

Usage

About

Releases

Packages

Languages

siddharthdevilz/sid-coding-test

Folders and files

Latest commit

History

Repository files navigation

Coding Challenge

Requirements

Files included

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages