-
Notifications
You must be signed in to change notification settings - Fork 0
/
17.sf.qmd
242 lines (179 loc) · 7.88 KB
/
17.sf.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: "17. Spatial Features"
author: "Patrick Spauster"
format:
html:
code-copy: true
toc: true
toc-location: right
editor: visual
---
## Video Tutorial
```{=html}
<div style="max-width:608px"><div style="position:relative;padding-bottom:66.118421052632%"><iframe id="kaltura_player" src='https://cdnapisec.kaltura.com/p/1674401/embedPlaykitJs/uiconf_id/54887362?iframeembed=true&entry_id=1_plejsmgu&config%5Bprovider%5D=%7B%22widgetId%22%3A%221_f9957b0r%22%7D&config%5Bplayback%5D=%7B%22startTime%22%3A0%7D' allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" title="17. SF" style="position:absolute;top:0;left:0;width:100%;height:100%;border:0"></iframe></div></div>
```
## Intro to SF
```{r include = F}
library(tidyverse)
library(janitor)
```
The Spatial features package allows us to read in spatial data into R and transform it. Let's get a dataset on \[affordable housing production\]([https://data.cityofnewyork.us/resource/hg8x-zxpr](https://data.cityofnewyork.us/resource/hg8x-zxpr.)) from the NYC open data portal.
```{r}
aff_hsg <- read_csv("https://data.cityofnewyork.us/resource/hg8x-zxpr.csv?$limit=10000")
```
With SF, I can take tabular data and make it into a spatial dataset. In this case I have coordinates that I turn into points us st_as_sf
```{r}
library(sf)
aff_hsg_sf <- aff_hsg %>%
clean_names() %>%
filter(!is.na(longitude)) %>% #remove properties with missing coordinates
st_as_sf(coords = c("longitude", "latitude"), crs = 2263) #be careful! x = longitude, y = latitude
```
## Plotting SF Objects
Now that this is spatial, I can map it! geom_sf is the ggplot function that maps spatial objects. It works much like any other ggplot.
```{r}
ggplot()+
geom_sf(aff_hsg_sf, mapping = aes())
```
I can use programming for styling, too.
```{r}
ggplot()+
geom_sf(aff_hsg_sf,
mapping = aes(size = all_counted_units,
color = all_counted_units),
alpha = 0.25)
```
But that doesn't look very good, let's add some other layers. I can read in (CDs)\[<https://data.cityofnewyork.us/resource/jp9i-3b7y>\] directly as a shapefile using the geojson version from the Open data portal. This map is a long way from done, but now it has a semblance of a basemap.
```{r}
cd_sf <- read_sf("https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson") %>%
st_set_crs(st_crs(aff_hsg_sf)) # here I am setting the crs of the new data to the crs of the point data I already have
ggplot()+
geom_sf(cd_sf,
mapping = aes())+ #this layer will be on bottom!
geom_sf(aff_hsg_sf,
mapping = aes(size = all_counted_units,
color = all_counted_units),
alpha = 0.25) #this layer will be on top!
#alpha changes the opacity of the dots
```
I can also summarize data like I would in a tabular dataset
```{r}
aff_hsg_sum <- aff_hsg %>%
group_by(community_board) %>%
summarize(total_affordable_units = sum(all_counted_units, na.rm = T))
```
And then I could create a choropleth with the summarized data, now that I have the nta shapes (just a simple join!)
```{r}
cd_aff_sum <- cd_sf %>%
mutate(community_board = case_when(
str_sub(boro_cd, 1, 1) == "1" ~ paste0("MN-",str_sub(boro_cd, 2,3)),
str_sub(boro_cd, 1, 1) == "2" ~ paste0("BX-",str_sub(boro_cd, 2,3)),
str_sub(boro_cd, 1, 1) == "3" ~ paste0("BK-",str_sub(boro_cd, 2,3)),
str_sub(boro_cd, 1, 1) == "4" ~ paste0("QN-",str_sub(boro_cd, 2,3)),
str_sub(boro_cd, 1, 1) == "5" ~ paste0("SI-",str_sub(boro_cd, 2,3)),
)) %>% #take the first character of boro_cd, replace it with the boro abbreviation, add the community board number
left_join(aff_hsg_sum, by = "community_board")
```
I can map that, now as a choropleth
```{r}
ggplot()+
geom_sf(cd_aff_sum,
mapping = aes(fill = total_affordable_units))
```
## Matching and Crosswalks
I could also match it to other data, and write it out to a shapefile to use in GIS
```{r}
library(tidycensus)
census_stats <- get_acs(
geography = "tract",
variables = c(population = "B01003_001",
med_income = "B07011_001"),
year = 2021,
state = "NY",
county = c("Bronx", "New York", "Kings", "Queens", "Richmond"),
output = "wide"
) %>%
clean_names()
```
I need a \[crosswalk\](<https://data.cityofnewyork.us/resource/hm78-6dwm>) to go from census tracts to community districts. Open data has one!
```{r}
cdtas_tracts <- read_csv("https://data.cityofnewyork.us/resource/hm78-6dwm.csv?$limit=10000",
col_types = cols(geoid = col_character()))
cdta_stats <- census_stats %>%
left_join(cdtas_tracts, by = "geoid") %>%
group_by(cdtacode, cdtaname) %>%
summarize(total_pop = sum(population_e, na.rm = T),
avg_med_inc = mean(med_income_e, na.rm = T)) %>%
mutate()
```
Now I can join! With everything in the same data frame I can write it out to read into spatial software, or I can visualize it
```{r}
cd_aff_stats <- cd_aff_sum %>%
mutate(cdtacode = str_replace(community_board, "-", "")) %>%
left_join(cdta_stats) %>%
mutate(aff_units_person = total_affordable_units/total_pop)
```
```{r}
st_write(cd_aff_stats,
"output/cd_aff_hsg.shp",
append = F)
```
I can map affordable units per person (normalized!)
```{r}
ggplot()+
geom_sf(cd_aff_stats,
mapping = aes(fill = aff_units_person))
```
## Spatial Operations
What if I wanted to map both both the \# of units per cd and the median income?
I can use one of sf's spatial operators and to take a centroid and map points
```{r}
points_aff_cd <- cd_aff_stats %>%
select(total_affordable_units, cdtacode) %>%
st_centroid()
```
And overlay on a choropleth. You notice that a lot of the buildings are built in low income areas!
```{r}
ggplot()+
geom_sf(cd_aff_stats,
mapping = aes(fill = avg_med_inc))+
geom_sf(points_aff_cd,
mapping = aes(size = total_affordable_units),
color = "pink",
alpha = 0.5)
```
I could write out this ggplot to edit in vector graphics
```{r}
ggsave("output/aff_hsg_inc_nyc.svg")
```
## Spatial Joins
What if I didn't have a crosswalk and needed to match points to the community district? I can do a spatial join to find what cd all the points fall into.
```{r}
points_polygons_join <- st_intersection(aff_hsg_sf, cd_sf)
```
I can then use summarize() to count points in polygons - or do more advanced summaries
```{r}
points_polygons_join %>%
as.data.frame() %>% #summarize works faster without the spatial features attached
group_by(boro_cd) %>%
summarize(points_in_polygon = n(),
units_per_cd = sum(all_counted_units, na.rm = T))
```
What if I wanted to find the proportion of affordable housing in the flood zone? I can do this with an intersects and mutate - I don't even need to join the two datasets!
```{r flood zones}
# floodplain_2020s_100y <- read_sf("https://data.cityofnewyork.us/resource/inra-wqx3.geojson?$limit=10000") %>%
# st_make_valid() %>% #this magic function repairs any invalid geometries
# st_union() %>% #here I make the entire floodplain into one big shape
# st_set_crs(st_crs(points_polygons_join)) #and set it to the same crs as my points
#
# points_polygons_join %>%
# mutate(fplain_2020s_100y = lengths(st_intersects(.,floodplain_2020s_100y))) %>% #st_intersects returns a list of the shapes it intersects with - if its 0 it didn't intersect, if its 1 it did!
# as.data.frame() %>%
# group_by(fplain_2020s_100y) %>%
# summarize(count = n(),
# units = sum(all_counted_units, na.rm = T))
```
## Cheat Sheet
Many more spatial operations available on the cheat sheets
![](images/sf1.png)
![](images/sf2.png)\