a tool to compare medical procedures costs in geographic location
Medical costs are very expensive and need special attention when making financial decisions for non-emergency procedures.Users can make a decison based on prices around their city or different states.Not only users Government officials, hospital personnel or insurance companies can compare the rates to make any financial decision.
Data is generated by combining multiple sources that are mostly in .csv format. The initial files are preprocessed to remove missing values and inconsistency.The data generated is stored in s3 buckets.
Amazon redshift, a data warehouse that is designed for answering adhoc complex queries in distributed environment is used for querying the data.By setting the correct distribution keys, Sort keys and compression keys high performance is achieved.
The directory structure of repo look like this:
├── README.md
├── flask
└── app
|── static
| | └── css
| | └── js
| | └── fonts
|── templates
| | └── index.html
|── views.py
│ └── run.py
│ └── tornadoapp.py
├── generate_dta
│ └── procedure_code.py
├── redshift_queries
| └── copy_s3_redshift.sql
| └── procedure_city.txt
| └── table_creation.sql
AWS account
Boto3: to upload files to Amazon S3 for storage.
Amazon Redshift cluster: this project used 4 dc1.large nodes.