CMPE 257-Machine_Learning - Team 8

Team Members and Github Usernames

Nhat Trinh [011227645] - nhattrinh
Suhas Byrapuneni [016118596] - suhas-byrapuneni
Venkata Sai Sri Batchu [016118557] - chaitanya1818
Rutik Sanjay Sangle [016007589] - rutiksangle3436

Data

Our group will build the project using the COVID-19 dataset provided by WHO. Specifically, the dataset is named “Daily cases and deaths by date reported to WHO”. The dataset is provided as a CSV file. The dataset includes almost every country in the world that has reported COVID deaths since the start of the 2020 calendar year. Each row in the dataset has the date, country code, country name, assigned WHO region, new deaths, new cases, cumulative deaths, and cumulative cases.

Problem

The problem our group will be trying to solve is finding where and when the next COVID outbreak will happen by looking at historic daily data from the beginning of the pandemic and learning from it. A COVID outbreak can be characterized as an abnormal change of upwards slope in the daily COVID cases/deaths graph. Even though the cause of COVID-19 transmission can be multi-faceted, such as transmission through touch, proximity, city planning, region, etc. Our group believes it can be largely tied to seasonal and temperature changes that cause the uptick in COVID cases and/or deaths.

Potential Methods

Since our dataset is labeled, Our team will potentially try to use a supervised learning method; namely, the regression method can understand the relationship between dependent and independent variables. Specifically, our group will try to use an autoregression model utilizing Poisson distribution, called Poisson Autoregression (PAR).

Preprocessing

The initial data analysis was carried out by checking for the types of data values and checking if there are any missing values. The type of data values is suitable for our solution but there were a few missing values in the column ‘Country_code’. After some more investigation, we found out that the country code for Namibia was missing. By using the fillna() method of pandas, the missing values were replaced.

Initial Findings

For initial findings, we made some plots using Plotly.express library. The plots describe some basic information like the Top 10 countries with the highest number of cases. Along with that, we used the choropleth plot for plotting the world map which shows the number of total cases in every country.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
code/app		code/app
paper		paper
CMPE 257 Project Proposal.pdf		CMPE 257 Project Proposal.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMPE 257-Machine_Learning - Team 8

Team Members and Github Usernames

Data

Problem

Potential Methods

Preprocessing

Initial Findings

About

Releases

Packages

Contributors 4

Languages

rutiksangle3436/CMPE257-Machine_Learning

Folders and files

Latest commit

History

Repository files navigation

CMPE 257-Machine_Learning - Team 8

Team Members and Github Usernames

Data

Problem

Potential Methods

Preprocessing

Initial Findings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages