Skip to content
Tom Schenk Jr edited this page Feb 24, 2016 · 20 revisions

Project Workflow

A kanban board is located at Waffle.io. The project contains the following high-level tasks:

  • Combine the raw data files of the lab tests for 2008 and beyond.
  • Create a variable indicating lab results above the acceptable threshold.
  • Clean-up advisories from DrekBeach and remove advisories not caused by high predicted values of E. coli.
  • Merge (cleaned) advisories from above with lab results to determine if the advisory was correct.
  • Determine the baseline performance of the current model.
  • Add other data (e.g., weather and other predictors)
  • Create alternative models
  • Use test-train framework to compare performance of the new model.

Variables

Variable name Description
Client.ID Beach name
Full_date POSIX date of laboratory reading and corresponding prediction
Year Year of laboratory reading and corresponding prediction
Date Month_date of laboratory reading and corresponding prediction
Laboratory.ID Unique identifier for the laboratory testing
Reading.1 First laboratory testing results
Reading.2 Second laboratory testing results
Escherichia.coli Calculated geometric mean of Reading.1 and Reading.2, provided by the lab(do not use)
Units Units for Reading.1, Reading.2, and Escherichia.coli. Always "MPN/100 ML"
Sample.Collection.Time
Weekday Day of week for laboratory reading and corresponding prediction
Month Month for laboratory reading and corresponding prediction
Day Day of month for laboratory reading and corresponding prediction
Drek_Reading Actual reading values as scraped from Chicago Park District website via DrekBeach
Drek_Prediction Predicted values as scraped from Chicago Park District website via DrekBeach
Drek_Worst_Swim_Status "Worst" swim status collected throughout the course of a day
e_coli_geomean_actual_calculated Geometric mean of Reading.1 and Reading.2
elevated_levels_actual_calculated Binary variable indicating whether e_coli_geomean_actual_calculated is >= 235
summary A human-readable text summary of this data point.
icon A machine-readable text summary of this data point, suitable for selecting an icon for display. If defined, this property will have one of the following values: clear-day, clear-night, rain, snow, sleet, wind, fog, cloudy, partly-cloudy-day, or partly-cloudy-night. (Developers should ensure that a sensible default is defined, as additional values, such as hail, thunderstorm, or tornado, may be defined in the future.)
sunriseTime The UNIX time (that is, seconds since midnight GMT on 1 Jan 1970) of the last sunrise before and first sunset after the solar noon closest to local noon on the given day. (Note: near the poles, these may occur on a different day entirely!)
sunsetTime The UNIX time (that is, seconds since midnight GMT on 1 Jan 1970) of the last sunrise before and first sunset after the solar noon closest to local noon on the given day. (Note: near the poles, these may occur on a different day entirely!)
moonPhase A number representing the fractional part of the lunation number of the given day. This can be thought of as the “percentage complete” of the current lunar month: a value of 0 represents a new moon, a value of 0.25 represents a first quarter moon, a value of 0.5 represents a full moon, and a value of 0.75 represents a last quarter moon. (The ranges in between these represent waxing crescent, waxing gibbous, waning gibbous, and waning crescent moons, respectively.)
precipIntensity A numerical value representing the average expected intensity (in inches of liquid water per hour) of precipitation occurring at the given time conditional on probability (that is, assuming any precipitation occurs at all). A very rough guide is that a value of 0 in./hr. corresponds to no precipitation, 0.002 in./hr. corresponds to very light precipitation, 0.017 in./hr. corresponds to light precipitation, 0.1 in./hr. corresponds to moderate precipitation, and 0.4 in./hr. corresponds to heavy precipitation.
precipIntensityMax numerical values representing the maximumum expected intensity of precipitation (and the UNIX time at which it occurs) on the given day in inches of liquid water per hour.
precipProbability numerical values representing the minimum and maximumum temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
temperatureMin numerical values representing the minimum and maximumum temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
temperatureMinTime numerical values representing the minimum and maximumum temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
temperatureMax numerical values representing the minimum and maximumum temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
temperatureMaxTime numerical values representing the minimum and maximumum temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
apparentTemperatureMin numerical values representing the minimum and maximumum apparent temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
apparentTemperatureMinTime numerical values representing the minimum and maximumum apparent temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
apparentTemperatureMax numerical values representing the minimum and maximumum apparent temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
apparentTemperatureMaxTime numerical values representing the minimum and maximumum apparent temperatures (and the UNIX times at which they occur) on the given day in degrees Fahrenheit.
dewPoint A numerical value representing the dew point at the given time in degrees Fahrenheit.
humidity A numerical value between 0 and 1 (inclusive) representing the relative humidity.
windSpeed A numerical value representing the wind speed in miles per hour.
windBearing A numerical value representing the direction that the wind is coming from in degrees, with true north at 0° and progressing clockwise. (If windSpeed is zero, then this value will not be defined.)
visibility A numerical value representing the average visibility in miles, capped at 10 miles.
cloudCover A numerical value between 0 and 1 (inclusive) representing the percentage of sky occluded by clouds. A value of 0 corresponds to clear sky, 0.4 to scattered clouds, 0.75 to broken cloud cover, and 1 to completely overcast skies.
pressure A numerical value representing the sea-level air pressure in millibars.
precipIntensityMaxTime numerical values representing the maximumum expected intensity of precipitation (and the UNIX time at which it occurs) on the given day in inches of liquid water per hour.
precipType A string representing the type of precipitation occurring at the given time. If defined, this property will have one of the following values: rain, snow, sleet (which applies to each of freezing rain, ice pellets, and “wintery mix”), or hail. (If precipIntensity is zero, then this property will not be defined.)

Replication

[intentionally left blank]

Clone this wiki locally