-
Notifications
You must be signed in to change notification settings - Fork 2
/
getting-data.qmd
163 lines (120 loc) · 5.53 KB
/
getting-data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
output:
html_document:
includes:
in_header: analytics.html
css: styles.css
code_folding: show
toc: TRUE
toc_float: TRUE
pandoc_args:
"--tab-stop=2"
---
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato" />
::: {#header}
<img src="intro-to-r/images/urban-institute-logo.png" width="350"/>
:::
```{r echo = FALSE}
# source(here::here("getting-data", "census_api_key.R"))
```
```{r markdown-setup, include=FALSE}
knitr::opts_chunk$set(fig.path = "intro-to-r/www/images/")
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(echo = TRUE)
options(scipen = 999)
```
# Introduction
This guide outlines some useful workflows for pulling data sets commonly used by the Urban Institute.
## `library(tidycensus)`
`library(tidycensus)` by Kyle Walker ([complete intro here](https://walkerke.github.io/tidycensus/)) is the best tool for accessing some Census data sets in R from the Census Bureau API. The package returns tidy data frames and can easily pull shapefiles by adding `geometry = TRUE`.
You will need to [apply for a Census API key](https://api.census.gov/data/key_signup.html) and [add it to your R session](https://walkerke.github.io/tidycensus/articles/basic-usage.html). Don't add your API key to your script and don't add it to a GitHub repository!
Here is a simple example for one state with shapefiles:
```{r tidycensus}
library(tidyverse)
library(purrr)
library(tidycensus)
# pull median household income and shapefiles for Census tracts in Alabama
get_acs(geography = "tract",
variables = "B19013_001",
state = "01",
year = 2015,
geometry = TRUE,
progress = FALSE)
```
Smaller geographies like Census tracts can only be pulled state-by-state. This example demonstrates how to iterate across FIPS codes to pull Census tracts for multiple states. The process is as follows:
1. Pick the variables of interest
2. Create a vector of state FIPS codes for the states of interest
3. Create a custom function that works on a single state FIPS code
4. Iterate the function along the vector of state FIPS codes with `map_df()` from `library(purrr)`
Here is an example that pulls median household income at the Census tract level for multiple states:
```{r tidycensus-iteration}
# variables of interest
vars <- c(
"B19013_001" # median household income estimate
)
# states of interest: alabama, alaska, arizona
state_fips <- c("01", "02", "04")
# create a custom function that works for one state
get_income <- function(state_fips) {
income_data <- get_acs(geography = "tract",
variables = vars,
state = state_fips,
year = 2015)
return(income_data)
}
# iterate the function
map_df(.x = state_fips, # iterate along the vector of state fips codes
.f = get_income) # apply get_income() to each fips_code
```
`library(tidycensus)` works well with `library(tidyverse)` and enables access to geospatial data, but it is limited to only some Census Bureau data sets. The next package has less functionality but allows for accessing any data available on the Census API.
<br>
## `library(censusapi)`
`library(censusapi)` by Hannah Recht ([complete intro here](https://cran.r-project.org/web/packages/censusapi/vignettes/getting-started.html)) can access any published table that is accessible through the Census Bureau API. A full listing is available [here](https://api.census.gov/data.html).
You will need to [apply for a Census API key](https://api.census.gov/data/key_signup.html) and [add it to your R session](https://cran.r-project.org/web/packages/censusapi/vignettes/getting-started.html). Don't add your API key to your script and don't add it to a GitHub repository!
Here is a simple example that pulls median household income and its margin of error for Census tracts in Alabama:
```{r censusapi}
library(tidyverse)
library(purrr)
library(censusapi)
vars <- c(
"B19013_001E", # median household income estimate
"B19013_001M" # median household income margin of error
)
getCensus(name = "acs/acs5",
key = Sys.getenv("CENSUS_API_KEY"),
vars = vars,
region = "tract:*",
regionin = "state:01",
vintage = 2015) %>%
as_tibble()
```
Smaller geographies like Census tracts can only be pulled state-by-state. This example demonstrates how to iterate across FIPS codes to pull Census tracts for multiple states. The process is as follows:
1. Pick the variables of interest
2. Create a vector of state FIPS codes for the states of interest
3. Create a custom function that works on a single state FIPS code
4. Iterate the function along the vector of state FIPS codes with `map_df()` from `library(purrr)`
Here is an example that pulls median household income at the Census tract level for multiple states:
```{r censusapi-iteration}
# variables of interest
vars <- c(
"B19013_001E", # median household income estimate
"B19013_001M" # median household income margin of error
)
# states of interest: alabama, alaska, arizona
state_fips <- c("01", "02", "04")
# create a custom function that works for one state
get_income <- function(state_fips) {
income_data <- getCensus(name = "acs/acs5",
key = Sys.getenv("CENSUS_API_KEY"),
vars = vars,
region = "tract:*",
regionin = paste0("state:", state_fips),
vintage = 2015)
return(income_data)
}
# iterate the function
map_df(.x = state_fips, # iterate along the vector of state fips codes
.f = get_income) %>% # apply get_income() to each fips_code
as_tibble()
```