The West Nile Virus (WNV) has been a serious problem for the United States since 1999. The CDC has acknowledged it as the leading cause of mosquito-borne disease in the continental United States. However, there are no vaccines to prevent or medications to treat WNV in people -- according to the CDC, 1 in 5 people who are infected develop a fever and other symptoms, while 1 out of 150 infected people develop a serious, sometimes fatal, illness.
In Illinois, the West Nile virus was first identified in September 2001 when laboratory tests confirmed its presence in two dead crows found in the Chicago area. The following year, the state's first human cases and deaths from West Nile disease were recorded and all but two of the state's 102 counties eventually reported a positive human, bird, mosquito or horse. By the end of 2002, Illinois had counted more human cases (884) and deaths (64) than any other state in the United States.
Since then, Illinois and more specifically Chicago, has continued to suffer from multiple outbreaks of the West Nile Virus. From 2005 to 2016, a total of 1,371 human WNV cases were reported within Illinois. Out of these total reported cases, 906 cases (66%) were from the Chicago region (Cook and DuPage Counties).
With this in mind, our project is aimed at predicting outbreaks of the West Nile Virus. This will help the City of Chicago and Chicago Department of Public Health (CDPH) more efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus. Specifically, our model will use a combination of weather, time and location features to predict the presence of WNV within mosquito traps set up throughout Chicago.
Our top-performing model was a Logistic Regression model (Lasso Regularization with an Tmax
and WinterDepart
. Time-based features like Week
and Month
were also crucial in helping our model to identify the presence of the West Nile Virus. We also saw other weather-related variables like humidlag4
play a role in our model. Location features like Longitude
and trap location also played a minor role here.
Model | Train AUC | Test AUC | Precision | Specificity | Recall | F-score |
---|---|---|---|---|---|---|
Logistic Regression | 0.858627 | 0.823617 | 0.140553 | 0.750251 | 0.739394 | 0.236205 |
Ada Boosting | 0.961605 | 0.808654 | 0.157986 | 0.837630 | 0.551515 | 0.245614 |
Extra Trees | 0.974448 | 0.807757 | 0.148900 | 0.831604 | 0.533333 | 0.232804 |
Gradient Boosting | 0.997995 | 0.801529 | 0.232673 | 0.948108 | 0.284848 | 0.256131 |
Random Forest | 0.992065 | 0.797336 | 0.177616 | 0.886843 | 0.442424 | 0.253472 |
Support Vector Machine | 0.938088 | 0.778879 | 0.140950 | 0.806160 | 0.575758 | 0.226460 |
Decision Tree | 0.989197 | 0.724557 | 0.157360 | 0.888852 | 0.375758 | 0.221825 |
We also chose the Logistic Regression model due to its high recall score. Given that the West Nile Virus can lead to human death, it's imperative for false negatives to be minimized and for true positives to be maximized. Our Logistic Regression model has by far the best recall score out of all the other models (0.74) though it has weak precision and specificity scores, we believe that this is a fair trade-off as incorrectly predicting the lack of WNV can increase the chances of an outbreak, leading to potential snowball effects on hospitalization rates and the economy.
Our model has shown that certain areas are particularly 'dense' in terms of WNV-positive pools and pool proximity. In conjunction to this, our model also predicted several traps that have an 80% probability or greater of a WNV outbreak. We believe that the neighborhoods in which these traps are located should be an immediate focus for mosquito control efforts. These areas have been highlighted with a red circle below.
We've extrapolated that these are the neighborhoods that have a high risk of WNV:
- Elk Grove Village (7,500 acres)
- Des Plains (9,000 acres)
- Norridge (1,100 acres)
- Lincolnwood (1,700 acres)
- Stickney (1,200 acres)
- Forest View (900 acres)
- Morton Grove (3,100 acres)
Around 24,500 acres of area in Chicago were identified as high risk, housing an approximate population of 148,500 people.
We recommend using the following methods to deal with areas with a high WNV risk:
Automation with drones
Drones can serve multiple purposes when it comes to combatting WNV:
- They can collect aerial images that can be analyzed and used to identify and map breeding sites—such as cisterns, pots and buckets, old tires and flower pots. These images can be aggregated into accurate maps to support targeted application of larvicides (insecticides that specifically target the larval stage of an insect) at these potential breeding sites.
- Once larval habitats are identified, drones can be equipped to carry and apply larvicides and/or adulticides to small targeted areas. These drones can also be fitted with a global positioning system (GPS) that can track flight patterns in conjunction with insecticide application. An operator can remotely pilot the drone or, in some cases, autopilot programs may be available for pre-programmed flights. Drones can be useful to target specific areas with larvicides or adulticides, as an alternative to truck-mounted applications that may require a high degree of drift of droplets to reach a target area in remote locations. These drones can spray potentially up to 80 acres in a day's work.
In summary, drones could be more environmentally friendly than doing the same spraying procedure on foot, and are likely to be a lot more accurate due to the ability to spray from a fully-vertical angle.
Adoption of best practices
The city of Chicago can aim to reduce spraying target areas by adopting guidelines from tropical countries like Singapore that have proven to be successful in combatting mosquito-borne viruses. For example, this could involve active on-the-ground checks of homes and premises for mosquito habitats, where public officers provide advice on steps you can take to prevent mosquito breeding and impose penalties if premises are found with mosquito breeding.
Through automation and more efficient mosquito vector control processes, we can reduce the cost of spraying, enabling the city of Chicago to save human lives and prevent further outbreaks of the West Nile Virus.
Name | Dataset | Type | Description |
---|---|---|---|
Id | train/test | int | ID number of the record |
Date | train/test | datetime | Date the WNV test was performed |
Address | train/test | datetime | Approximate trap address retrieved from GeoCoder |
Species | train/test | str | Mosquito species in trap |
Block | train/test | str | Block Number of address |
Street | train/test | str | Street of address |
Trap | train/test | str | ID number of the trap |
AddressNumberAndStreet | train/test | str | Approximate address retrieved from GeoCoder |
Latitude | train/test | float | Latitude retrieved from GeoCoder |
Longitude | train/test | float | Longitude retrieved from GeoCoder |
AddressAccuracy | train/test | int | Accuracy of information returned from GeoCoder |
NumMosquitos | train | int | Number of mosquitoes in a sample |
WnvPresent | train | int | Whether or not WNV is present in a sample (1 = present; 0 = absent) |
Station | weather | str | Station 1 or 2 |
Date | weather | datetime | Date of measurement (YY/MM/DD) |
Tmax | weather | float | The highest temperature for the day in degrees Fahrenheit (F). |
Tmin | weather | float | The lowest temperature for the day in degrees Fahrenheit (F). |
Tavg | weather | float | The average temperature for the day in degrees Fahrenheit (F). |
Depart | weather | float | Departure from normal temperature. The difference between column 4 and the 30 year normal temperature for this date. A minus (-) is number of degrees below normal. A zero (0) indicates that the average for that day was the normal temperature. (F) |
DewPoint | weather | float | Average Dew Point temperature (F) |
WetBulb | weather | float | Average Wet Bulb temperature (F) |
Heat | weather | float | Heating Degree Days base 65F, season begins with July. |
Cool | weather | float | Cooling Degree Days base 65F, season begins with January. |
Sunrise | weather | float | Time of sunrise |
Sunset | weather | float | Time of sunset |
CodeSum | weather | str | Significant weather phenomena |
Depth | weather | float | Snow depth on the ground to the nearest inch |
Water1 | weather | float | Water Equivalent in inches |
SnowFall | weather | float | Total snowfall for the day to the nearest tenth of an inch. |
PrecipTotal | weather | float | Total precipitation for the day to the nearest hundredth of an inch. This includes all forms of precipitation, both liquid and water equivalent of any snow or ice that occurred |
StnPressure | weather | float | Average station pressure in hg (inches) |
SeaLevel | weather | float | Average sea level pressure in hg (inches) |
ResultSpeed | weather | float | Resultant Wind Speed - Vector sum of wind speeds divided by number of observations (MPH) |
ResultDir | weather | float | Resultant Wind Direction - Vector sum of wind divided by number of observations (in tens of degrees) |
AvgSpeed | weather | float | Average wind speed (MPH) |
Date | spray | datetime | The date the pesticide was sprayed (YY/MM/DD) |
Time | spray | datetime | Time of spray |
Latitude | spray | float | The latitude of the location sprayed. |
Longitude | spray | float | The longitude of the location sprayed. |