Staging #5

abwilliams13 · 2021-03-11T17:13:21Z

ERAP 2019 update

updated from 2018 to 2019

2018 -> 2019 overcrowding vars

2018 -> 2019

2018 - > 2019

Update acs jobloss abby

Updates CHAS data to 2017

2018 -> 2019

1) reran code with updated variables 2) starting line 420, updated year from 2018 to 2019. replaced all instances of "tracts_18" with "tracts_19"

ajjitn

Overall looks good! I made a few style comments, There were a few errors I fixed via a commit to code-review. Those were:

A timeout I was having with the download.file() fxn, probably due to my slow internet conenction. I increased the default timeout and used a different method= argument inside download.file()
There was a slight error in the perc_cost_burdened_under_35k calculations. Previously when the denominator was 0, the value was NA and we manually converted those NA back to 0. But our code to check whether the dneominator was zero was incomplete and missing a few terms. This didn't affect any rows in the 2018 data, but did affect 835 rows in the 2019 data.
I've also added a few data checks with the assert() function that I used for testing. I've kept those in for future updates.
I've also slightly modified the generate_index function which now requires that we explicitly download and provide the tracts data (which we were/are already doing).

And below are things that still need to be changed:

As a result of the above changes and the new 2019 data, the total number of all water tracts in 2019 is now 320 instead of 319 so we'll want to change that in the technical appendix. And we'll need to update the comments in the code near line 338 in script 1
As a result of the above changes and the new 2019 data,there are currently no tracts where we have to use the national level means (like we did with the 9 tracts in New Mexico last time). So we'll want to change that in the technical appendix and update the comments in the code near line 344 in script 1
There are a couple of places where we could be smarter about centralizing variables and changing variable names so that in the future, we only need to update the year in a few places to rerun the code. These are optional changes, but would definitely help if we foresee running another update

Finally it would be good to compare final data files to ensure the scripts are reproducible. If someone has ran the full pipeline on their computer (after pulling my changes) wants to send me their final outputs via email, I can double check that our outputs are identical.

ajjitn · 2021-03-19T15:27:10Z

scripts/02_generate_geographies.R

@@ -5,25 +5,25 @@ library(tigris)

 options(tigris_use_cache=FALSE)



If you're planning on updating this in the future, I would suggest defining a global variable at the top like year_used=2019 and then substituting in year_used everywhere you manually typed in 2019 in the script. Then in future years, you only have to change the years in one spot!

ajjitn · 2021-03-19T15:28:37Z

scripts/02_generate_geographies.R

@@ -5,25 +5,25 @@ library(tigris)

 options(tigris_use_cache=FALSE)

-state_2018 = tigris::states(year = 2018, class = "sf") %>% 
+state_2019 = tigris::states(year = 2019, class = "sf") %>% 


As an add on the above comment, you probably want to give generic variable names here instead of having the variable names dependent on the year. So names this states, and the other objects us_counties, and us_counties_cb. Then you won't have to update these names every year

ajjitn · 2021-03-19T16:01:29Z