Oregon Fire Analysis and Prediction (1992-2015)

Backround

As data analysts we want to use our skills to help our local and global communities, so for this project we wanted to explore an issue regarding climate-change. As Oregonians the issue of wildfires is of recurring relevance to our lives. After decades of ever-more frequent wildfires Oregon saw its worst fire season in 2021. With average temperatures continuing to rise locally and around the globe, the situation shows no signs of slowing. We discovered a great dataset with which to explore this topic. This data is from Kaggle and includes an SQLite file and various CSV files containing data about 1.88 million US wildfires over 23 years.

Questions we hope to answer in the coming weeks:

• Have wildfires in Oregon increased in frequency?

• Have Oregon wildfires increased in size over time?

• Has Oregon wildfire season changed over time?

• Using machine-learning can we predict how wildfires may change in the next 5 years?

• Can we determine which areas of the State are of higher-risk?

Plan

We will communicate primarily through Slack over the next few weeks, with check-ins on Saturdays and Sundays, as well as during class time. For the first segment Nina explored the raw data and cleaned it up for use. Alex explored possible machine learning options. Jovan created the repository and documented the process.

Data Cleaning

After importing libraries and the SQLite file into Jupyter Notebook, and creating an engine to talk to the database we performed the following steps:

Inspected the names of the tables and printed the columns to see what data we have to work with.
Filtered the data to show only fire data from Oregon. This left us with 61,088 rows of data.
Determined what the data types in the columns are.
Created a dataframe of Oregon fires with only the columns relevant to our inquiry.
Checked the counts of the columns to see if any have missing data, then dropped those rows. This left us with 60,751 rows of data.
Explored the value counts of the column containing causes of fires in Oregon.
Explored the value counts of the column containing counties where fires are reported.
Converted Julian version of discover dates and containment dates into a readable format, and dropped the Julian versions from the dataframe.
Calculated the number of days each fire burned for.

Machine Learning

Once the data has been cleaned, machine learning models can be run on some of the numerical data.

Fires dataframe was merged with precipitation and temperature dataframes grouped by year and county. Final dataframe for ML included county code, year, average fire size, average discovery month, average containment month, average fire duration, average precipitation, and average temperature (832 rows × 6 columns).

Simple multiple linear regression and various tuned random forest regression models were tested. Also attempted was a time series VAR model, but it did not pass the Granger’s Causality Test.

Multilinear Regression

Method:

Split into train and test sets.
Run .LinearRegression()
Fit and test the model.

Accuracy: r2 = -0.235

Conclusion: Not a good model. R2 score is negative, which is not good.

Random Forest Regression

Method:

Split into train and test sets.
Scale the data (because we have very large values)
Run .RandomForestRegressor()
Fit and test the model.
Tune with GridSearchCV. {'criterion': 'mse', 'max_depth': 7, 'max_features': 'sqrt', 'n_estimators': 500}
Tune multiple times with RandomizedSearchCV. Final parameters: {'n_estimators': 415, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'max_depth': None}

Accuracy: No tuning: 21.86%, r2 = 0.193 GridSearchCV: 38.6%, r2 = 0.215 RandomSearchCV: 47.41 %, r2 = 0.258

Conclusion: Not a good model. Accuracy and r2 scores are too low.

VAR Time Series

Method:

Test causation using Granger’s causality test
Perform cointegration test
Split the series into training and testing data
Check for stationarity and make the time series stationary
Select the order (p) of VAR model
Train the VAR model of selected order (p)
Check for serial correlation of residuals (errors) using Durbin Watson statistic
Train the VAR model of selected order (p)
Invert the transformation to get the real forecast
Plot of forecast vs actuals
Evaluate the forecast

Accuracy: Did not pass the Granger’s causality test.

Conclusion: Could not use this model because it did not pass the Granger's causality test.

Lessons and Future Improvements

To improve the random forest regression analysis, we should have encoded (dummy or simple) for the year and the county code to improve accuracy. Even so, these were not the best models and data to use. What we were really after was a forecasting model that would help us predict fire outcomes into the future using a form of time series analysis, and the best data to use is geographic data (MODIS) paired with weather and vegetation data, using Google Earth Engine. MODIS data includes a product, Global Daily Fire Location Product, that would be best for this analysis. We thought VAR would be the best model to use in this instance, but actually the Autoregressive Integrated Moving Average model (ARIMA) would be best, which finds the best autoregression (AR) model and the moving average (MA) of weighted linear combination to obtain the prediction method. Global Fire Season Severity Analysis and Forecasting by Ferreira et al. (2019) outlined their methods for accomplishing a fire prediction model, and future model building for Oregon could also use similar methods, including ARIMA.

A future project could include a categorical analysis using neural networks that could predict final fire class size (A,B,C,D,E,F,or G) based on current parameters you entered, including current temperature, humidity, county, month of the year, acres burned already, and duration so far. MODIS data could also be incorporated. Unfortunately, we did not have the time to complete this model.

Database

We will be using postgresql running in AWS RDS.

Here is the ERD containing the graphical representation of table relationships:

Presentation

Our presentation will likely contain the following:

• Plot of fire frequency

• Plot of fire size

• Plot of fire season

• Plots of 2027 predictions

• Plot of high-risk areas

• The results of our machine learning tests

Link to presentation url: Google Slides

Link to Heroku app: https://oregonfires.herokuapp.com/

Team: Alex Dallman, Jovan Humphrey, Nina Q

Sources: https://www.kaggle.com/datasets/rtatman/188-million-us-wildfires https://www.ncdc.noaa.gov/cag/county/mapping/35/pcp/200506/1/value.
https://data.oregon.gov/Natural-Resources/Oregon-counties-map/djry-8qn8

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Resources		Resources
static		static
templates		templates
.gitignore		.gitignore
Fire-data-charts-by-county.ipynb		Fire-data-charts-by-county.ipynb
FireDataExploration.ipynb		FireDataExploration.ipynb
Machine-Learning-VAR-model.ipynb		Machine-Learning-VAR-model.ipynb
Machine-Learning.ipynb		Machine-Learning.ipynb
Precipitation.ipynb		Precipitation.ipynb
Procfile		Procfile
README.md		README.md
Schema.txt		Schema.txt
Temperature.ipynb		Temperature.ipynb
app.py		app.py
fireGeoJSON.ipynb		fireGeoJSON.ipynb
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oregon Fire Analysis and Prediction (1992-2015)

Backround

Plan

Data Cleaning

Machine Learning

Multilinear Regression

Random Forest Regression

VAR Time Series

Lessons and Future Improvements

Database

Presentation

About

Releases

Packages

Languages

alexdallman1029/Oregon_Wildfires

Folders and files

Latest commit

History

Repository files navigation

Oregon Fire Analysis and Prediction (1992-2015)

Backround

Plan

Data Cleaning

Machine Learning

Multilinear Regression

Random Forest Regression

VAR Time Series

Lessons and Future Improvements

Database

Presentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages