Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final Project #20

Open
sunaynagoel opened this issue Dec 1, 2019 · 76 comments
Open

Final Project #20

sunaynagoel opened this issue Dec 1, 2019 · 76 comments

Comments

@sunaynagoel
Copy link

@Anthony-Howell-PhD I am running into following error while knitting the .rmd document.

Quitting from lines 56-78 (Final_Project_Outline_Storyboard-Goel.Rmd)
Error in loadNamespace(name) : there is no package called 'lorem'
Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted

When I tried to install the package "lorem", following error was produced.

 Package LibPath Version Priority Depends Imports LinkingTo Suggests Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum
 NeedsCompilation Built

Anyone else running into this issue?

@sunaynagoel
Copy link
Author

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries).
library (lorem)

@Jigarci3
Copy link

Jigarci3 commented Dec 2, 2019

@Anthony-Howell-PhD I might be completely off on this but I am trying to subset census.dats for my MSA. Here is my code

grep("^SEA", census.dats$msaname, value = TRUE)
these.sea <- census.dats$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- census.dats$fipscounty[ these.sea ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

sea.pop1 <-
  get_acs( geography = "tract", variables = "Median.HH.Value00", "Foreign.Born00", "Recent.Immigrant00", "Poor.English00", "Veteran00", "Poverty00", "Poverty.Black00", " Poverty.White00", "Poverty.Hispanic00", "Pop.Black00", "Pop.Hispanic00", "Pop.Unemp00", "Pop.Manufact00", "Pop.SelfEmp00", "Pop.Prof00", "Female.LaborForce00",
         state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>% 
         select( "TRTID10", estimate ) %>%
         rename( POP=estimate )

sea.pop2 <-
get_acs( geography = "tract", variables = "Median.HH.Value10", "Foreign.Born10", "Recent.Immigrant10", "Poor.English10", "Veteran10", "Poverty10", "Poverty.Black10", " Poverty.White10", "Poverty.Hispanic10", "Pop.Black10", "Pop.Hispanic10", "Pop.Unemp10", "Pop.Manufact10", "Pop.SelfEmp10", "Pop.Prof10", "Female.LaborForce10",
         state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>% 
         select( "TRTID10", estimate ) %>%
         rename( POP=estimate )

sea.pop <- rbind(sea.pop1, sea.pop2)

I am getting the following error: "Error in if (shift_geo) { : argument is not interpretable as logical"

I can't figure out how to correct this error or if I am on the right track with my attempt to only include Seattle data.

@AntJam-Howell
Copy link
Collaborator

@Jigarci3 You do not have to use the get_acs function to download data for the final project. The code chunk (below) gives you the 2000 and 2010 census variables. You have the census.dats dataframe that includes the tract ('TRTID10'), state ('state') and county ('county') information already. You need to subset the census.dats to include only the Seattle counties of your interest.

@sunaynagoel
Copy link
Author

@Anthony-Howell-PhD. The main (top horizontal) navigation bar is hiding the titles and descriptions of the widgets below it. Is there anyway to customize it? I tried different things but could not achieve desired results. Thanks
I am attaching a screen shot.
Screen Shot 2019-12-02 at 11 48 51 AM

@lecy
Copy link
Collaborator

lecy commented Dec 2, 2019

You can create a custom Cascading Style Sheet (CSS) to moderate this behavior (you have not learned this yet), but the easiest solution is to simplify the menu bar.

Shorten the project title ("Community Analytics Practicum Extravaganza" is tongue-in-cheek, you can change it), and consider grouping some items (can you combine clustering, neighborhoods, and neighborhood change? ).

@sunaynagoel
Copy link
Author

@lecy Thank you. Shortening the menu bar helped.

@sunaynagoel
Copy link
Author

I was wondering if limiting the decimals places in the table displayed using datatable() to 4 or 5? Will it affect the predictions?

@AntJam-Howell
Copy link
Collaborator

AntJam-Howell commented Dec 3, 2019 via email

@etbartell
Copy link

etbartell commented Dec 3, 2019

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries).
library (lorem)

@Anthony-Howell-PhD I'm having this same issue with knitting the original rmd but it was not solved with the code provided above. When I try it with this code:

knitr::opts_chunk$set(  message=F, warning=F, echo=F )

install.packages("devtools")
devtools::install_github("gadenbuie/lorem")

#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( lorem )

I get the following error message:

image

When I try to simply use install.package( "lorem" ), it tells me that "package ‘lorem’ is not available (for R version 3.6.1)". Do I need to download a different version of R? I thought we were all using the same version.

@AntJam-Howell
Copy link
Collaborator

@etbartell If you cannot download and load the lorem package, the easiest thing to do is go through the .rmd file and remove the lorem call feature. To do this, paste into your search box of the .rmd file to find all instances of the following code: r lorem::ipsum(paragraphs = 1)

You can then delete this code chunk one by one or all at once. Just remember everytime you see that code, it represents a place for to provide your own answer. You can still return to these places to provide your answer by searching for the <!--- symbol that denotes the instructions.

@etbartell
Copy link

@etbartell If you cannot download and load the lorem package, the easiest thing to do is go through the .rmd file and remove the lorem call feature. To do this, paste into your search box of the .rmd file to find all instances of the following code: r lorem::ipsum(paragraphs = 1)

You can then delete this code chunk one by one or all at once. Just remember everytime you see that code, it represents a place for to provide your own answer. You can still return to these places to provide your answer by searching for the <!--- symbol that denotes the instructions.

That worked, thanks!

@meliapetersen
Copy link

I'm having a weird issue with my code from lab 4 (it didn't happen when I turned in the lab, but it's happening now).

I'm getting the error that I am not using an argument:

Error in rename(., POP = estimate) : unused argument (POP = estimate)

When running this code:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
  select( GEOID, estimate ) %>%
  rename( POP = estimate )

URL <- "https://github.com/DS4PS/cpp-529-master/raw/master/data/ltdb_std_2010_sample.rds"
census.dat <- readRDS(gzcon(url( URL )))

# merge shapefile data with census data in new dataframe
seattle <- merge( seattle.pop, census.dat, by.x="GEOID", by.y="tractid" )
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]
seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )

For the empirical framework portion of the dashboard.

Am I on the right track for this portion? I am also unclear on that as well. This was just the code I had from lab 4.

@AntJam-Howell
Copy link
Collaborator

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.

Screenshot 2019-12-03 15 01 58

Countyfipconcordance.pdf

@meliapetersen
Copy link

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.

Screenshot 2019-12-03 15 01 58

Countyfipconcordance.pdf

I see where I'm going wrong, thank you!

@castower
Copy link

castower commented Dec 3, 2019

@Anthony-Howell-PhD. I was able to knit the file after including the following code.

install.packages1("devtools")
devtools::install_github("gadenbuie/lorem")

And later by calling the library (with all other libraries).
library (lorem)

@Anthony-Howell-PhD I'm having this same issue with knitting the original rmd but it was not solved with the code provided above. When I try it with this code:

knitr::opts_chunk$set(  message=F, warning=F, echo=F )

install.packages("devtools")
devtools::install_github("gadenbuie/lorem")

#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( lorem )

I get the following error message:

image

When I try to simply use install.package( "lorem" ), it tells me that "package ‘lorem’ is not available (for R version 3.6.1)". Do I need to download a different version of R? I thought we were all using the same version.

@etbartell I ran into the same problem and found that entering the following code fixed it:

devtools::install_github("gadenbuie/lorem")

I read here for additional info: https://github.com/gadenbuie/lorem

Edit: oops, just realized this is the exact same code as above, I somehow overlooked that!

@lepp12
Copy link

lepp12 commented Dec 3, 2019

@Anthony-Howell-PhD

I'm running into a similar issue as other on the section requiring code from Lab 4. However, I'm not getting a descriptive error. When I run the following code:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.san <- crosswalk$msaname == "SAN DIEGO, CA"
these.fips <- crosswalk$fipscounty[ these.san ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

san.pop <-
  get_acs( geography = "tract", variables = "B01003_001", state = "06", county = county.fips[state.fips=="06"], geometry = TRUE ) %>%
  select( GEOID, estimate ) %>%

I only get "Error: "

@AntJam-Howell
Copy link
Collaborator

You do not need to download data using get_acs. You already have the data you need with census.dats. You only need to subset the census.dats date to your chosen MSA (which is typically a few different counties). Please see the response to Melia above (pasted below) and let me know if that helps.

@meliapetersen sorry to hear that is happening. My suggestion is to focus on understanding how to subset the census.dats dataset to only your MSA of interest. Based on your code, your counties of interest are ("029" "033" "061"). The census.dat dataframe have the actual names of the counties not numbers. It was intended that this dilemna would lead people to search online for county fips (see my google search screenshot attached). The first option is a concordance (attached also below). You will have to match the number of your fip counties to the names in the concordance, then subset those county names in your census.dats dataset.
Screenshot 2019-12-03 15 01 58
Countyfipconcordance.pdf

@AntJam-Howell
Copy link
Collaborator

@lepp12 please see above reply.

@castower
Copy link

castower commented Dec 4, 2019

@lepp12, if you don't want to have to Google the names, they are in the crosswalk dataset. Therefore, I just altered my data frame from the crosswalk to be as follows:

name.fips <- crosswalk$countyname[these.YOURCITY]
data.frame( state=state.fips, county=county.fips, FIPS=these.fips, name=name.fips)

This then gave me the names of each county.

@AntJam-Howell
Copy link
Collaborator

Nice find @castower

@meliapetersen
Copy link

I'm still having trouble understanding what I'm supposed to do with the names of the counties and pulling them from census.dats . I have identified the fip names, but is there a specific place I can refer to for an explanation of the code to pull just the select info for the rest of the dashboard? It feels like such a simple answer but I cannot seem to make sense of it. Thank you!

@castower
Copy link

castower commented Dec 4, 2019

I'm still having trouble understanding what I'm supposed to do with the names of the counties and pulling them from census.dats . I have identified the fip names, but is there a specific place I can refer to for an explanation of the code to pull just the select info for the rest of the dashboard? It feels like such a simple answer but I cannot seem to make sense of it. Thank you!

@meliapetersen I used the filter function to just select the needed counties

@sunaynagoel
Copy link
Author

I am a little lost at reading transition matrix. Here is a screen shot of my transition matrix.


Screen Shot 2019-12-03 at 6 53 04 PM

@AntJam-Howell
Copy link
Collaborator

AntJam-Howell commented Dec 4, 2019 via email

@etbartell
Copy link

I'm having trouble understanding the change variables conceptually. If we were going for percent change, we would just use (2010var-2000var)/2000var, but since we're using the formula of 2000var/(2010var+1), I don't understand what the values are telling us. With the exception of home price, the other variables are all decimals, and adding 1 to the denominator completely alters its value. For example, if ForeigBornChange = 0.095, this doesn't mean that the foreign-born population changed by 9.5%. It's just what the formula spit out. I feel like I'm missing something. Does anyone have a solid grasp of what these variables mean?

@castower
Copy link

castower commented Dec 4, 2019

I have a question concerning the dorling maps. In Lab 4 we were creating them based on household income, but I'm not sure what we're clustering here. Should we group these by the cluster variable or something else? I may be overlooking a step, but I can't quite figure out what I'm plotting.

Thanks!

@AntJam-Howell
Copy link
Collaborator

@etbartell Nice question here and nice catch. Actually, it is more intuitive to have the change variables defined as 2010var/2000var rather than in the .rmd file which has it as 2000var/2010var. With respect to adding a constant to a variable, in this case it would be better to add a small value to the variables. So for home values, adding a 1 makes sense. When working with proportions it makes more sense to add a .01 instead of 1. I will update these changes to the .rmd file.

@sunaynagoel
Copy link
Author

@etbartell Nice question here and nice catch. Actually, it is more intuitive to have the change variables defined as 2010var/2000var rather than in the .rmd file which has it as 2000var/2010var. With respect to adding a constant to a variable, in this case it would be better to add a small value to the variables. So for home values, adding a 1 makes sense. When working with proportions it makes more sense to add a .01 instead of 1. I will update these changes to the .rmd file.

This make so much more sense now. Thank @etbartell for asking this question and @Anthony-Howell-PhD for the help.

@AntJam-Howell
Copy link
Collaborator

@castower Besides household income, we also used dorling to map clusters in Lab 4. see the attached screenshot from lab 4 instructions.

Screenshot 2019-12-03 19 50 10

@castower
Copy link

castower commented Dec 4, 2019

@Anthony-Howell-PhD Thank you!
I have another question about the data tab of the flexdashboard. Should there be labels on the blue tabs? I can't figure out how to name them.
Screen Shot 2019-12-03 at 7 41 11 PM

@castower
Copy link

castower commented Dec 4, 2019

There is a way that it could be done. Could try to troubleshoot it on google search, but the easiest and perhaps more informative way is to change variable names either directly to the data or indirectly through ggplot. I googled change variable names in ggplot and the first option that pops up is the following link that may get you started (Link https://stackoverflow.com/questions/52656493/renaming-variable-names-in-a-ggplot2 )

On Wed, Dec 4, 2019 at 2:28 PM Courtney @.***> wrote: Is there anyway to set ggplot to not cut off the titles of my labels on the histogram grid? The look fine in RMarkdown, but when I knit the file some of the title labels are cut off: [image: Screen Shot 2019-12-04 at 1 26 09 PM] https://user-images.githubusercontent.com/54308186/70183052-e8d06400-1699-11ea-9873-446dd91d26c0.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20?email_source=notifications&email_token=AMK2Y7YX4L5MIFJBGA5MIK3QXAOHVA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6RUFA#issuecomment-561846804>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y746FL45ZNT7PC6HZY3QXAOHVANCNFSM4JTOIFQQ .
-- Anthony Howell Asst. Prof. in Public Policy School of Public Affairs Arizona State University Faculty Profile https://isearch.asu.edu/profile/3501621 (CV https://www.dropbox.com/s/b1pxccpwxm6fats/Howell.CV.pdf?dl=0)

Thank you!

One other question, I discovered that my data set has one massive outlier for the House Price change variable (there's an instance where in 2000 the median house price was only $300 and in 2010 it was $284,900). Should I exclude this outlier since it's skewing the data (especially the mean) or just mention it in my summary?

Thanks!

@castower
Copy link

castower commented Dec 5, 2019

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

@castower
Copy link

castower commented Dec 5, 2019

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

Also want to note, that if you want to leave the variables alone, you can use the fig.width setting for r-markdown to widen the figure.

@sunaynagoel
Copy link
Author

If anyone else has questions about changing the grid labels, this website has great instructions: https://www.datanovia.com/en/blog/how-to-change-ggplot-facet-labels/

Also want to note, that if you want to leave the variables alone, you can use the fig.width setting for r-markdown to widen the figure.

Thanks @castower

@AntJam-Howell
Copy link
Collaborator

@lepp12 the problem is that you've subsetted the data to only 2000 variables, run the prediction, and then you are trying to predict new data with the full dataset that includes both the 2000 and 2010 variables. I suggest instead of the following Census2000 <-census.dats you may want to create 2 separate datasets, 1 for 2010 and 1 for 2000, and make sure that the same variables are included in both.

@AntJam-Howell
Copy link
Collaborator

@castower There are several options, all of which are quite common, and you are free to choose what you think is best: windsorize the variable, trim the variable, remove the outlier outright, or take the log of the variable.

@castower
Copy link

castower commented Dec 5, 2019

@castower There are several options, all of which are quite common, and you are free to choose what you think is best: windsorize the variable, trim the variable, remove the outlier outright, or take the log of the variable.

@Anthony-Howell-PhD thank you!

@meliapetersen
Copy link

@meliapetersen you can remove the ### Identifying Communities. There is no output to show there.

@Anthony-Howell-PhD It looks like below there is a place to "interpret results" before the cluster analysis. If there is no output to show, are there results to interpret?

@AntJam-Howell
Copy link
Collaborator

The main point of that section is to define and label each of the cluster groupings, which can be done on the side panel of each cluster output figure. You are free to keep in the ### identifying communities and add some basic description of what you did, i.e. perform cluster analysis, but there is no visualization for now. You can add your own visualization if you want though.

@AntJam-Howell
Copy link
Collaborator

@meliapetersen see above reply

@meliapetersen
Copy link

Hi, I'm having trouble identifying the variables needed for the dorling cartograms. Can someone help me make a little but more sense of it in regards to how to merge the spatial information to the Census2010 data frame. I keep going back to Lab 4 and what we've coded thus far and I'm not fully understanding what I'm supposed to do. Thank you! :)

@AntJam-Howell
Copy link
Collaborator

@meliapetersen I would suggest getting the spatial data information using get_acs for your state. Once you have that, you will want to merge your census dataframe to the spatial dataframe.

SpatialData <-
get_acs( geography = "tract", variables = "B01003_001", state = "??", geometry = TRUE ) %>%
select( GEOID, estimate )

SpatialData<-merge(SpatialData, CensusDataframeNAME,all.x='GEOID',all.y='TRTID10')

That should get you started

@lecy
Copy link
Collaborator

lecy commented Dec 5, 2019

@meliapetersen If helpful, the course GitHub site has information on how the dorling cartograms were built for labs 3 and 4. That code might be instructive:

https://github.com/DS4PS/cpp-529-master/blob/master/data/README.md

@meliapetersen
Copy link

@meliapetersen I would suggest getting the spatial data information using get_acs for your state. Once you have that, you will want to merge your census dataframe to the spatial dataframe.

SpatialData <-
get_acs( geography = "tract", variables = "B01003_001", state = "??", geometry = TRUE ) %>%
select( GEOID, estimate )

SpatialData<-merge(SpatialData, CensusDataframeNAME,all.x='GEOID',all.y='TRTID10')

That should get you started

Awesome, thank you! I think I figured it out, but I'm getting an error when I run my code:

census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)

seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", 
           state = "53", geometry = TRUE ) %>%
  select( GEOID, estimate ) 

seattle.pop$GEOID<-substring(seattle.pop$GEOID, 2)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )


seattle2 <- seattle[ ! st_is_empty( seattle ) , ]


seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )

seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]

seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
seattle_dorling <- seattle_dorling( x=seattle.sp, weight="pop.w", k=0.05 )

tm_shape( seattle_dorling ) + 
  tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")

Error:

Error in st_cast_sfc_default(x) : list item(s) not of class sfg

@lecy
Copy link
Collaborator

lecy commented Dec 5, 2019

I think you are missing an argument here:

substring( seattle.pop$GEOID, 2 )

You usually have a starting position and ending position for the substring() function.

What does the Census2010 tract ID TRTID10 look like? Is it state, county, and tract FIP IDs, or just one of them?

SS-CCC-TTTTTT

@RickyDuran
Copy link

RickyDuran commented Dec 5, 2019

I am having trouble with the same area (creating dorling):

census_api_key("42bf5fcc6e6a6f05ebe97a0e647a5216a708613a")

aus.pop <-
get_acs( geography = "tract", variables = "B01003_001",
         state = "48", county = county.fips[state.fips=="48"], geometry = TRUE ) %>% 
         select( GEOID, estimate )

aus <- merge( aus.pop, austin.data, by.x="GEOID", by.y="TRTID10" )

aus.sp <- as_Spatial( aus )

class( aus.sp )

aus.sp <- spTransform( aus.sp, CRS("+init=epsg:3395"))

aus.sp <- aus.sp[ aus.sp$POP != 0 & (! is.na( aus.sp$POP )) , ]

aus_dorling <- cartogram_dorling( x=aus.sp, weight="pop.w", k=0.05 )

I get the error:
Error in packcircles::circleRepelLayout(x = dat.init, xysizecols = 1:3, : all sizes are missing and/or non-positive

@meliapetersen
Copy link

@lecy So I noticed that I didn't have the county = county.fips[state.fips=="53"]argument in my seattle.pop code, but I don't know if that's what you were talking about because that didn't fix the issue. I'm still getting the same argument.
code here:

census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)

seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", 
           state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
  select( GEOID, estimate ) 

seattle.pop$GEOID<-substring(seattle.pop$GEOID, 2)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )


seattle2 <- seattle[ ! st_is_empty( seattle ) , ]


seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )

seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]

seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
seattle_dorling <- seattle_dorling( x=seattle.sp, weight="pop.w", k=0.05 )

tm_shape( seattle_dorling ) + 
  tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")

I'm not quite sure I understand what I need to add to the substring argument.

@lecy
Copy link
Collaborator

lecy commented Dec 5, 2019

@RickyDuran Do you have a POP variable? I might have renamed it from the default census name.

Did you create the weighted pop variable?

phx$pop.w <- phx$POP / 10000   # standardizes it to max of 1.5

You might have to adjust the denominator depending on the max population. If you also have an NA or a 0 for population it could create the error you are getting. You should drop those polygons (filter them out) before the conversion.

summary( phx$POP )

@lecy
Copy link
Collaborator

lecy commented Dec 5, 2019

@meliapetersen The get_acs() function requires both state and county fips. Here is how they are being generated from the original GEOID (which is state-county-tract FIPS combined):

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )
these.msp <- crosswalk$msaname == "MINNEAPOLIS-ST. PAUL, MN-WI"
these.fips <- crosswalk$fipscounty[ these.msp ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

data.frame( these.fips, state.fips, county.fips ) %>% pander()
these.fips state.fips county.fips
1 27003 27 003
2 27019 27 019
3 27025 27 025
4 27037 27 037
5 27053 27 053
6 27059 27 059
7 27123 27 123
8 27139 27 139
9 27141 27 141
10 27163 27 163
11 27171 27 171
12 55093 55 093
13 55109 55 109

If your MSA spans two states then you need to split these codes into two separate calls to get_acs(), I believe. One for each state using the corresponding county FIPS. Recall that county FIPS are not unique since each state will have a 001, 002, etc.

> county.fips
 [1] "003" "019" "025" "037" "053" "059" "123" "139" "141" "163" "171" "093"
[13] "109"
> county.fips[ state.fips=="55" ]
[1] "093" "109"

Substring is pulling out each FIPS.

@meliapetersen
Copy link

@lecy So I fixed the issue I was having, and now it's giving me a different error. I added:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

name.fips <- crosswalk$countyname[these.seattle]

census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)

And now it is giving me an issue with this line of code:


seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )

Giving me this error:
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
Error in x@polygons[[1]] : subscript out of bounds

Here is both code chunks now:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

name.fips <- crosswalk$countyname[these.seattle]

census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options(tigris_use_cache = TRUE)

seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", 
           state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
  select( GEOID, estimate ) 

seattle.pop$GEOID<-substring(seattle.pop$GEOID, 1)
seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )


seattle2 <- seattle[ ! st_is_empty( seattle ) , ]


seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )

seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]

seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )

tm_shape( seattle_dorling ) + 
  tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")

 

@JaesaR
Copy link

JaesaR commented Dec 5, 2019

I think you are missing an argument here:

substring( seattle.pop$GEOID, 2 )

You usually have a starting position and ending position for the substring() function.

What does the Census2010 tract ID TRTID10 look like? Is it state, county, and tract FIP IDs, or just one of them?

SS-CCC-TTTTTT

I am having the same issue that Melia is having, but adding the code she added to fix this issue did not work for me.

My code looks like this:

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

name.fips <- crosswalk$countyname[these.chi]

census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
         state = "06", county = county.fips[state.fips=="17"], geometry = TRUE ) %>% 
         select( GEOID, estimate ) %>%
         dplyr::rename(POP = estimate)
# merge shapefile dT with census data in new datframe

chi.pop$GEOID<-substring(chi.pop$GEOID, 2)
chi <- merge( chi.pop, census.dats, by.x="GEOID", by.y="TRTID10" )
chi2 <- chi[! st_is_empty(chi), ]
chi.sp <- as_Spatial( chi2 )
class( chi.sp )

# project map and remove empty tracts
chi.sp <- spTransform( chi.sp, CRS("+init=epsg:3395"))
chi.sp <- chi.sp[ chi.sp$POP != 0 & (! is.na( chi.sp$POP )) , ]

# convert census tract polygons to dorling cartogram
chi.sp$pop.w <- chi.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
chi_dorling <- cartogram_dorling( x=chi.sp, weight="pop.w", k=0.05 )

tm_shape( chi_dorling ) + 
  tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")

plot(chi.sp)

And is returning the error: "Error in st_cast_sfc_default(x) : list item(s) not of class sfg"

I do not understand what you mean about the substring argument not being complete. Am i supposed to add the FIPS within that argument?

@RickyDuran
Copy link

RickyDuran commented Dec 6, 2019

@lecy

I believe I have pop data, I am doing the exact same thing for downloading shapefiles with popuoation data as in lab 4, although when I leave in "rename( POP=estimate )" in

aus.pop <- 
get_acs( geography = "tract", 
variables = "B01003_001", 
state = "48", 
county = county.fips[state.fips=="48"], 
geometry = TRUE ) %>% 
select( GEOID, estimate ) %>% 
rename( POP=estimate )

I get ERROR in rename(POP = estimate) : (POP=estimate) not used

When I take it out, I also get Error in x@polygons[[1]] : subscript out of bounds at:
aus.sp$pop.w <- aus.sp$POP / 9000

@AntJam-Howell
Copy link
Collaborator

AntJam-Howell commented Dec 6, 2019 via email

@castower
Copy link

castower commented Dec 6, 2019

@Anthony-Howell-PhD I just finished recording my video and it came out to right at 24 mins and 53 seconds. Is this okay?

@AntJam-Howell
Copy link
Collaborator

AntJam-Howell commented Dec 6, 2019 via email

@lecy
Copy link
Collaborator

lecy commented Dec 6, 2019

@meliapetersen Not sure about your current error, this seems to work (but I don't know what data you were joining with the Census2010 object because you did not include that code):

library( sp )          # work with shapefiles
library( sf )          # work with shapefiles - simple features format
library( dplyr )       # data wrangling 
library( tidycensus )
library( cartogram )  # spatial maps w/ tract size bias reduction
library( maptools )   # spatial object manipulation 


crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.seattle <- crosswalk$msaname == "SEATTLE-BELLEVUE-EVERETT, WA"
these.fips <- crosswalk$fipscounty[ these.seattle ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

census_api_key("b431c35dad89e2863681311677d12581e8f24c24")
options( tigris_use_cache = TRUE )

# only have one state, so can use county fips directly
seattle.pop <-
  get_acs( geography = "tract", variables = "B01003_001", 
           state = "53", county = c("029", "033", "061"), geometry = TRUE ) %>%
  select( GEOID, estimate ) %>%
  rename( POP=estimate )

# I am not sure how you create Census2010 here
# seattle.pop$GEOID <- substring( seattle.pop$GEOID, 1 )
# seattle <- merge( seattle.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
seattle <- seattle.pop

class( seattle.pop )  # sf
seattle2 <- seattle[ ! st_is_empty( seattle ) , ]

seattle.sp <- as_Spatial( seattle2 )
class( seattle.sp )  # sp
seattle.sp <- spTransform( seattle.sp, CRS("+init=epsg:3395"))

nrow( seattle.sp )  # 569
seattle.sp <- seattle.sp[ seattle.sp$POP != 0 & (! is.na( seattle.sp$POP )) , ]
nrow( seattle.sp )  # 567

seattle.sp$pop.w <- seattle.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
summary( seattle.sp$pop.w )

seattle_dorling <- cartogram_dorling( x=seattle.sp, weight="pop.w", k=0.05 )

plot( seattle_dorling, col="red" )

image

@castower
Copy link

castower commented Dec 6, 2019

Absolutely!
On Thu, Dec 5, 2019 at 6:28 PM Courtney @.***> wrote: @Anthony-Howell-PhD https://github.com/Anthony-Howell-PhD I just finished recording my video and it came out to right at 24 mins and 53 seconds. Is this okay? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20?email_source=notifications&email_token=AMK2Y7YIW2B5ZGNJL7RUGWDQXGTDDA5CNFSM4JTOIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCWZSI#issuecomment-562392265>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMK2Y745WSTF73CQ4XS7KA3QXGTDDANCNFSM4JTOIFQQ .
-- Anthony Howell School of Public Affairs Arizona State University (W) www.tonyjhowell.com

@Anthony-Howell-PhD thank you!

@lecy
Copy link
Collaborator

lecy commented Dec 6, 2019

@JaesaR Where does "06" come from?

these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
         state = "06", county = county.fips, geometry = TRUE ) %>% 
         select( GEOID, estimate ) %>%
         dplyr::rename(POP = estimate)
data.frame( state.fips, county.fips )
state.fips county.fips
17 031
17 037
17 043
17 063
17 089
17 093
17 097
17 111
17 197

@lecy
Copy link
Collaborator

lecy commented Dec 6, 2019

@JaesaR this works, note the commented out code:

these.chi <- crosswalk$msaname == "CHICAGO, IL"
these.fips <- crosswalk$fipscounty[ these.chi ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

data.frame( state.fips, county.fips ) 

census_api_key("624bc0325068577dab800279b9251a06f1200af3")
options(tigris_use_cache = TRUE)
chi.pop <-
get_acs( geography = "tract", variables = "B01003_001",
         state = "17", county = county.fips, geometry = TRUE ) %>% 
         select( GEOID, estimate ) %>%
         dplyr::rename(POP = estimate)

# chi.pop$GEOID<-substring(chi.pop$GEOID, 2)
# chi <- merge( chi.pop, census.dats, by.x="GEOID", by.y="TRTID10" )
chi <- chi.pop

chi2 <- chi[! st_is_empty(chi), ]
chi.sp <- as_Spatial( chi2 )
class( chi.sp )

# project map and remove empty tracts
chi.sp <- spTransform( chi.sp, CRS("+init=epsg:3395"))
chi.sp <- chi.sp[ chi.sp$POP != 0 & (! is.na( chi.sp$POP )) , ]

# convert census tract polygons to dorling cartogram
chi.sp$pop.w <- chi.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
chi_dorling <- cartogram_dorling( x=chi.sp, weight="pop.w", k=0.05 )

plot( chi_dorling, col="steelblue" )

image

@lecy
Copy link
Collaborator

lecy commented Dec 6, 2019

@RickyDuran you did not have enough code for a reproducible example. So I'm not sure how the error is introduced, but what you have looks fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests