-
Notifications
You must be signed in to change notification settings - Fork 43
Home
Tom Schenk Jr edited this page Dec 8, 2015
·
20 revisions
A kanban board is located at Waffle.io. The project contains the following high-level tasks:
- Combine the raw data files of the lab tests for 2008 and beyond.
- Create a variable indicating lab results above the acceptable threshold.
- Clean-up advisories from DrekBeach and remove advisories not caused by high predicted values of E. coli.
- Merge (cleaned) advisories from above with lab results to determine if the advisory was correct.
- Determine the baseline performance of the current model.
- Create alternative models
- Use test-train framework to compare performance of the new model.
Split raw data from Excel workbooks into individual CSVs
python e-coli-beach-predictions/data/ChicagoParkDistrict/raw/Standard 18 hr Testing/split_sheets.py
Stack the sheets into a single Excel workbook for a given year:
csvstack 2006\ *.csv > 2006.csv
csvstack 2007\ *.csv > 2007.csv
csvstack 2008\ *.csv > 2008.csv
csvstack 2009\ *.csv > 2009.csv
csvstack 2010\ *.csv > 2010.csv
csvstack 2011\ *.csv > 2011.csv
csvstack 2012\ *.csv > 2012.csv
csvstack 2013\ *.csv > 2013.csv
csvstack 2014\ *.csv > 2014.csv
csvstack 2015\ *.csv > 2015.csv
Then, combine the annual files into a single file:
csvstack 2006.csv 2007.csv 2008.csv 2009.csv 2010.csv 2011.csv 2012.csv 2013.csv 2014.csv 2015.csv > beach_lab_readings.csv
This work is licensed under a Creative Commons Attribution 4.0 International License.