- Event: 2021 ASA Data Challenge Expo
- Name: Walter Yu
- Organization: Code for America
- Section: Sacramento Brigade
- Code Repository: Github
This project aims to help disadvantaged communities during the COVID-19 pandemic by answering the questions listed below through analysis of core and supplemental datasets. The intended audience are state/local governments, non-governmental organizations (NGOs) and volunteers which are able to provide aid and services to these communities.
- Explore the relationship between socioeconomic features of the U.S. population and disadvantaged communities.
- Identify disadvantaged communities based on their median household income. These communities are likely be more impacted by the COVID-19 pandemic and in need of public services.
- Provide recommendations on helping these communities based on data analysis results.
This entry focuses on California communities to control its scope since several questions are being considered, and data analysis of all U.S. communities would expand the scope and length of this report. This limited scope provides for more detail and attention to be paid to analysis, documentation and recommendations.
This project and its analysis are designed to be interpretable, so it organizes data analysis steps into the following modules:
- Overview: Outline approach, assumptions and data sources
- Data Processing: Data preparation for analysis
- Data Analysis: Model fit, coefficient interpretation and diagnostics
- Recommendations: Document key findings from data analysis
- Future Improvements: Possible improvements upon completing analysis
This entry makes the following assumptions:
- Although the scope is limited to California communities, the methodology may be applied to other states since it is based on data extracted from the U.S. Census for the state/county level and do not contain any characteristics specific to California.
- State and federal guidelines typically define disadvantaged communities as being low-income, so median household income was used to identify such communities. In addition, state and federal guidelines typically define low income as 20% of median household income.
- Data analysis was documented to be clear and easily interpretable, so linear regression and the Law of Parsimony were applied whenever possible. The linear models are improved incrementally upon for interpretability.
This entry analyzes core and supplemental datasets from the data challenge problem statement as follows:
- Core Dataset: 2019 American Community Survey (ACS) Single-Year Estimates
- Supplemental Dataset: COVID-19 Data from the National Center for Health Statistics
Data was downloaded from portal websites as follows:
- U.S. Census Website: Advanced search feature was used to filter data in the following order: Surveys > Years > Geography > Topics.
- U.S. Census COVID-19 Website: CA state data was downloaded from the categorical dataset search page.
- National Center for Health Statistics (NCHS) Website: Death counts by county and race downloaded from their data portal.
Datasets of interest were identified from the U.S. Census data portal and extracted using the advanced search tool. Table ID numbers are listed for reference.
- 2019 American Community Survey (ACS) Single-Year Estimates - Language Spoken
- Description: PLACE OF BIRTH BY LANGUAGE SPOKEN AT HOME AND ABILITY TO SPEAK ENGLISH IN THE UNITED STATES
- Survey/Program: American Community Survey
- Years: 2019
- Table: B06007
- 2019 American Community Survey (ACS) Single-Year Estimates - Household Income
- Description: HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2019 INFLATION-ADJUSTED DOLLARS)
- Survey/Program: American Community Survey
- Years: 2019
- Table: B19001
- 2019 American Community Survey (ACS) Single-Year Estimates - Median Household Income
- Description: MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2019 INFLATION-ADJUSTED DOLLARS)
- Survey/Program: American Community Survey
- Years: 2019
- Table: B19013
- 2019 American Community Survey (ACS) Single-Year Estimates - Poverty Status
- Description: POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
- Survey/Program: American Community Survey
- Years: 2019
- Table: B17001
- 2019 American Community Survey (ACS) Single-Year Estimates - Housing Cost
- Description: MONTHLY HOUSING COSTS
- Survey/Program: American Community Survey
- Years: 2019
- Table: B25104
- 2019 American Community Survey (ACS) Single-Year Estimates - Education Attainment
- Description: EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
- Survey/Program: American Community Survey
- Years: 2019
- Table: B15003
- 2019 American Community Survey (ACS) Single-Year Estimates - Commute Mode
- Description: MEANS OF TRANSPORTATION TO WORK BY AGE
- Survey/Program: American Community Survey
- Years: 2019
- Table: B08101
- 2019 American Community Survey (ACS) Single-Year Estimates - Race
- Description: RACE
- Survey/Program: American Community Survey
- Years: 2019
- Table: B02001
Datasets of interest were identified from the U.S. Census COVID-19 data portal under the categorical dataset section.
- U.S. Census - COVID-19 Demographic and Economic Resources
- Dataset: California Counties DP02 Social
- U.S. Census - COVID-19 Demographic and Economic Resources
- Dataset: California Counties DP03 Economic
- U.S. Census - COVID-19 Demographic and Economic Resources
- Dataset: California Counties DP04 Housing
- U.S. Census - COVID-19 Demographic and Economic Resources
- Dataset: California Counties DP05 Demographic
- U.S. Census - COVID-19 Demographic and Economic Resources
- U.S. Census - COVID-19 Demographic and Economic Resources
Datasets of interest were identified from the National Center for Health Statistics (NCHS) data portal for COVID-related mortality count by California county to evaluate impacts by the pandemic.
- NCHS - COVID-19 Data from the National Center for Health Statistics
- NCHS - COVID-19 Data from the National Center for Health Statistics
Datasets of interest were identified from the U.S. Census COVID-19 data portal under the categorical dataset section. They were not used during the analysis due to time and scope constraints so are documented for future use.
- U.S. Census - COVID-19 Demographic and Economic Resources
- Description: American Community Survey (ACS) about household income ranges and cutoffs and Poverty Status.
- These are 5-year estimates shown by state and county boundaries.
- Link: Dataset
- U.S. Census - COVID-19 Demographic and Economic Resources
- Description: American Community Survey (ACS) about household income ranges and cutoffs.
- These are 5-year estimates shown by county, and state boundaries.
- Link: Dataset
- U.S. Census - COVID-19 Demographic and Economic Resources
- Description: American Community Survey (ACS) about total population count by sex and age group.
- These are 5-year estimates shown by state and county boundaries.
- Link: Dataset