forked from jcoliver/learn-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
015-wrangling-richness.Rmd
643 lines (455 loc) · 26.1 KB
/
015-wrangling-richness.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
---
title: "Data Wrangling & Species Richness"
author: "Jeff Oliver"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document: default
pdf_document:
latex_engine: xelatex
urlcolor: blue
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.height = 3)
```
We collect data from a lot of sources, including field observations. Sometimes, though, the way will collect and understand data as humans doesn't work well with the way computers operate. Here we will work through a real data set to use best practices for data organization to analyze species richness and diversity.
#### Learning objectives
1. Demonstrate how to organize data in 'tidy' format
2. Develop quality control processes for data
3. Visualize measures of species richness and species diversity
***
## Getting started
> Happy families are all alike; every unhappy family is unhappy in its own way - Leo Tolstoy
Since we are working in RStudio, start by making a new project from the File menu (File > New Project...). When prompted, select New Directory, then New Project. Name the project sweetwater-analysis. When the project opens, create a folder to hold the data by typing `dir.create(data)` in the Console.
### Tidy data
After Wickham 2014, "tidy" data is defined by:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
The survey data we are working with is available as an Excel file for download at [https://tinyurl.com/sweetwater-data](https://tinyurl.com/sweetwater-data). (if you want to skip the data wrangling, and work with the "tidy" data, you can download them [here](https://raw.githubusercontent.com/jcoliver/learn-r/gh-pages/data/sweetwater-data-clean04.csv)). Make sure to move the file to the 'data' folder you created.
Open the file in Excel. What can be done to make these data "tidy"?
First, the data are effectively broken in two, with the most recent observations separate from observations from preceding days. So in some cases, a row has more than one observation. To fix this, we move the cells from the preceding observations to rows after the most recent observations.
We also want to:
+ Delete any columns without data
+ Delete any rows without data
+ Add a column title for Date
+ Make sure data for preceding observations lines up with column title (currently the Count column has species names and the Species column has count data)
Let's save this file as a CSV file so we can read it into R. We don't want to overwrite the original data file, so we'll call our file sweetwater-data-clean.csv.
***
## Data wrangling
Now that the data are in CSV format, we can read it into R and take a look. We want to keep a record of all our work, so we are going to type commands into a script file and save the script. Create a new script via the File menu (File > New File > R Script). As we did before, we add a few lines of comments at the top of our script with some relevant information.
```{r read-data-01-eval, echo = FALSE}
bird_data <- read.csv(file = "data/sweetwater-data-clean01.csv")
```
```{r read-data-01, eval = FALSE}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
We can look at the first few rows with the `head` function:
```{r head-data-01}
head(bird_data)
```
And we see that in rows two through 6 there aren't any values for the Date and Site columns. This is a prime example of the difference between how a human interprets data and how a computer interprets the data. Let's open up our CSV file in Excel again. As humans, we can infer that the value for "Date" should be October 4th all the way to row 35. But the computer is stupid. The computer looks at row 3, sees no date, and infers that the date is _missing_. We need to explicitly tell the computer what the value for date is. Before we do that, we have the opportunity to go ahead and also change the date to the standard ISO date format of "YYYY-MM-DD". In this case, October 4th becomes 2019-10-04.
Go ahead and update all the dates and fill in values appropriately. You should also note that for observations prior to October 10th, there is an offset that (1) needs to be fixed and (2) prompts deletion of now empty rows.
Also, for the 2019-10-04 observations, do the same thing with data in the Site column - that is, fill in values where appropriate. Save the file as a CSV (you can use the same filename, sweetwater-data-clean.csv) and read it into R.
```{r read-data-02-eval, echo = FALSE}
bird_data <- read.csv(file = "data/sweetwater-data-clean02.csv")
```
```{r read-data-02, eval = FALSE}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
Looking at the updated data, we see those first six rows no longer have missing data.
```{r head-data-02}
head(bird_data)
```
We can also use the `summary` function to get an idea of the data set overall:
```{r summary-data}
summary(bird_data)
```
There are still missing values in the Site column, but that's OK - those are observations from preceding days, which don't have any site data.
***
## Species richness
OK, so now let's actually look at species richness. We are going to look at each date separately, and count the total number of species observed. To do this, we are going to use an additional package called tidyverse; we will need to install the package and load it into memory before we try to use it. First, we check to see if the package is installed by looking at the Packages tab in the lower-right window. If tidyverse _is_ installed, we don't need to install it again. If tidyverse is not listed, you can install it through the R console with the command:
```{r install-tidyverse, eval = FALSE}
install.packages("tidyverse")
```
Now that the package is installed, we also have to tell R to load the package into memory, so we add a call to `library` to our script. We often add all `library` commands to the top of our script.
```{r load-tidyverse, eval = FALSE}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
library(tidyverse)
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
```{r load-tidyverse-eval, echo = FALSE}
library(tidyverse)
```
You might see some read messages when you load tidyverse. As long as you don't see the word "Error" everything should be fine.
Now we can use some tidyverse functions to calculate the total number of species seen on each day. We use the `group_by` function to first create a different pile of data for each day, then count the number of rows using the `n` function.
```{r unclean-richness}
# Calculate species richness per day
richness <- bird_data %>%
group_by(Date) %>%
summarize(Richness = n())
```
Let's look at the output by typing the name of our variable into the console:
```{r view-unclean-richness}
richness
```
Pretty cool. We can even plot species richness for each day, too. We are going to use the ggplot2 package again, which comes with tidyverse.
To plot richness for each day,
```{r plot-unclean-richness}
# Plot daily species richness
richness_plot <- ggplot(data = richness,
mapping = aes(x = Date,
y = Richness)) +
geom_point()
print(richness_plot)
```
Hey, a plot!
Let's take a moment though to think about the data that went into making these plots. In fact, let's look at the first six rows again:
```{r show-non-species-01}
head(bird_data)
```
Hmmm...row four has "Finch", which isn't a species. At Sweetwater Wetlands, House Finches and Lesser Goldfinches are common, and sometimes other rare finches show up. Another example is on row 17. We can take a look at this part of the data by specifying which rows to show. Here we ask for R to print out rows 15 through 20:
```{r show-non-species-02}
bird_data[15:20, ]
```
In addition to "Finch sp" we all so see two more examples of observations not identified to species. Because we don't actually know what species these were, we will need to exclude them from our analyses.
***
## Computers are stupid
### Or, why quality control matters
We do not actually want to delete these records from our data, but rather we need a means of indicating whether or not they are identified to the species level. This should probably be done in the sweetwater-data-clean.csv file we've been working with. So open that file in Excel and add another column, immediately to the right of Species. Call this one "ID_to_species" and fill in values of 1 for those identified to species (e.g. Gila woodpecker would be scored as a 1) and values of 0 for those not identified to species (e.g. Finch sp and duck sp. would be scored as a 0). Save the file with the same name as before (sweetwater-data-clean.csv).
```{r read-data-03, eval = FALSE}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
```{r read-data-03-eval, echo = FALSE}
bird_data <- read.csv(file = "data/sweetwater-data-clean03.csv")
```
Now we can update our richness calculation to only include observations that were identified to species. We do this by adding a filter to the process. Using the code we had before, before calling `group_by`, we use the `filter` function to only include those observations that have a 1 in the ID_to_species column.
```{r store-unclean-richness, echo = FALSE}
# For comparing results before and after filter
unclean_richness <- richness
```
```{r richness-clean-01}
# Calculate species richness per day
richness <- bird_data %>%
filter(ID_to_species == 1) %>%
group_by(Date) %>%
summarize(Richness = n())
```
We can compare the old calculations for richness
```{r show-unclean-richness, echo = FALSE}
unclean_richness
```
to the new ones:
```{r view-clean-richness-01}
richness
```
And note the numbers have dropped a little when we exclude those observations. Now when we run the code for plotting, we see the results changed, too:
```{r plot-clean-richness-01}
richness_plot <- ggplot(data = richness,
mapping = aes(x = Date,
y = Richness)) +
geom_point()
print(richness_plot)
```
Let's take a quick look at the species that were observed on October 4, just to double check. We can do that in the console using the pipe (`%>%`) and the `filter` command (you'll see more than ten rows):
```{r view-one-day-species-01, eval = FALSE}
bird_data %>% filter(Date == "2019-10-04")
```
```{r view-one-day-species-eval-01, echo = FALSE}
bird_data %>% filter(Date == "2019-10-04") %>% slice(1:10)
```
We should see 34 rows. This lines up with our initial plot, that showed 34 species. But look a little closer, notably at rows 1 and 9. Both rows are observations of Mallard Ducks. So this species actually got counted (at least) twice. Because multiple sites were surveyed, We can't just count the number of rows in this data to get the number of species. We should actually do a little more data wrangling to get an accurate count of species for the October 4 sampling. What we are going to do is to create a new data object we'll call `bird_data_sums` that sums up the values for each species on each day.
```{r sum-sites-01}
# Sum species counts for each day
bird_data_sums <- bird_data %>%
filter(ID_to_species == 1) %>%
group_by(Date, Species) %>%
summarise(Count = sum(Count))
```
Now we can again look at the observations for October 4:
```{r view-one-day-species-02}
bird_data_sums %>% filter(Date == "2019-10-04")
```
Now this doesn't show us all the data, but looking at the end of the output, we see that R is showing us 10 rows of data, with `r nrow(bird_data_sums %>% filter(Date == "2019-10-04")) - 10` more rows. So we have gone from 34 species, down to `r nrow(bird_data_sums %>% filter(Date == "2019-10-04"))` species for October 4.
We can update our code to calculate richness and re-create our plot. Remember to replace `bird_data` with `bird_data_sums` in the first line of the code. We can also remove the `filter` function, because we already filtered out observations that were not identified to species when we created `bird_data_sums`.
```{r richness-clean-02}
# Calculate species richness per day
richness <- bird_data_sums %>%
group_by(Date) %>%
summarize(Richness = n())
# Plot daily species richness
richness_plot <- ggplot(data = richness,
mapping = aes(x = Date,
y = Richness)) +
geom_point()
print(richness_plot)
```
The only thing to really change in our plot is for October 4. OK, some of you might have noticed this already, but let's look again at those October 4 data:
```{r view-one-day-species-03}
bird_data_sums %>% filter(Date == "2019-10-04")
```
So, there's another issue to address. Look at rows 9 and 10. We have a species that is represented twice. Why? Here is a great example of computers only being able to do what you tell them to. We told R to count the total number of individuals for each species. Unfortunately, R sees "mallard" and "Mallard Duck" as two different species.
What do we do?
At this point, the best thing is probably to go into the Excel file and change the species names to be consistent. Now so far, we haven't actually _changed_ any of the data - we added a column and we filled in values where appropriate, but we did not change anything. When changing data, you always want to be careful. At this point, we should be keeping track of changes we made to the data. We are going to do this in a separate file, with notes about what we did. So start by creating a new text file (File > New File > Text File) and adding:
```
README for sweetwater species data
2019-11-01
+ Created CSV from original Excel data file
+ Changed to ISO date format
+ Standardized spelling of species' names
```
And save this file as "README.md".
Now we actually have to do the standardization of the species names, so open up the sweetwater-data-clean.csv file in Excel. What's the best way of going about changing names? We could search for non-standard spellings like "Gambils quail" but that assumes we know all the alternative spellings. What might help our cause is to sort by the Species column so things like "mallard" and "Mallard Duck show up next to one another. **Note**: this sorting is going to rearrange the order of the rows, so we should _only_ do this if the order of the rows does not matter. Remember the original data? Where the date was only shown in the first row? If we had sorted those data it would have been _bad_. Why? Because in that case, order _did_ matter. But by using tidy data formats, we can re-order rows without losing any information for a single observation.
So go ahead and change names so they are consistent. This includes
+ consistent spelling (Gambel's vs. Gambils),
+ consistent capitalization (Gila Woodpecker vs. Gila woodpecker), and
+ constent use of punctuation (Red-winged Blackbird vs. Redwing black bird and Cooper's Hawk vs. Coopers' Hawk).
Save the file to sweetwater-data-clean.csv and go back into R.
Re-run the line at the top of our script with the `read.csv` command:
```{r read-data-04, eval = FALSE}
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
```{r read-data-04-eval, echo = FALSE}
bird_data <- read.csv(file = "data/sweetwater-data-clean04.csv")
```
Re-run the line counting the number of each species per day:
```{r sum-sites-02}
# Sum species counts for each day
bird_data_sums <- bird_data %>%
filter(ID_to_species == 1) %>%
group_by(Date, Species) %>%
summarise(Count = sum(Count))
```
And calculate richness for each day and draw the plot
```{r richness-clean-03}
# Calculate species richness per day
richness <- bird_data_sums %>%
group_by(Date) %>%
summarize(Richness = n())
# Plot daily species richness
richness_plot <- ggplot(data = richness,
mapping = aes(x = Date,
y = Richness)) +
geom_point()
print(richness_plot)
```
Whew!
So at this point, let's think about what we've done in R. We really haven't had to do too much, we wrote code to read in the file, count the total number of each species seen each day, calculate species richness, and plot species richness. Most of our time was really spent wrangling the data into a format that could be used to analyze with a computer. This isn't unexpected. Data wrangling can take up as much as 80% of a data scientist's time (See [Furche et al. 2016. doi 10.5441/002/edbt.2016.44](http://dx.doi.org/10.5441/002/edbt.2016.44)).
***
## Species diversity
Now we are ready for species diversity! We are going to use Shannon's index ($H$) of species diversity:
$$
H = -\sum_{i=1}^{s}{p_i \ln p_i}
$$
which is the sum over $s$ species, where $p_i$ is the proportion individuals of the $i^th$ species ($n_i$) out of the total number of inviduals ($N$), or:
$$
p_i = \frac{n_i}{N}
$$
To calculate this measure of diversity, we are going to use the vegan package, which does _not_ come with tidyverse, so we'll have to install it with:
```{r install-vegan, eval = FALSE}
install.packages("vegan")
```
Add the `library` command to the top of our script:
```{r load-vegan, eval = FALSE}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
library(tidyverse)
library(vegan)
bird_data <- read.csv(file = "data/sweetwater-data-clean.csv")
```
```{r load-vegan-eval, echo = FALSE}
library(vegan)
```
We are going to use functions that come with the vegan package to help with these calculations. In order to use those functions, we need to convert the `bird_data_sums` from the "long" format it is now to "wide" format. That is, we need to change from:
| Date | Species | Count |
|:-----------|:-------------------|:-----:|
| 2019-10-04 | Abert's Towhee | 1 |
| 2019-10-04 | Gila Woodpecker | 2 |
| 2019-10-03 | Abert's Towhee | 1 |
| 2019-10-03 | Greater Roadrunner | 1 |
To:
| Date | Abert's Towhee | Gila Woodpecker | Greater Roadrunner |
|:-----------|:------------------:|:---------------:|:------------------:|
| 2019-10-04 | 1 | 2 | 0 |
| 2019-10-03 | 1 | 0 | 1 |
Where
+ each date corresponds to a single row
+ each species corresponds to a single column
We can do this with the `spread` function that comes with tidyverse. Here we tell R to use the values in the Species column as new column names, and the numbers in the Count column to use as values.
```{r clean-wide}
bird_data_wide <- bird_data_sums %>%
spread(key = Species, value = Count)
```
Let's look at the first five columns of the "wide" formatted data:
```{r view-clean-wide}
bird_data_wide[, 1:5]
```
For the diversity calculations, it only works on count data, and our data include one column of dates. We create a new variable, `species_counts` which has all the data except the first column (the Date column), then we calculate species richness and look at the results by running the name of the variable (`bird_diversity`):
```{r clean-diversity-01}
species_counts <- bird_data_wide[, -1]
bird_diversity <- diversity(x = species_counts)
bird_diversity
```
All the values are `NA` because the input data (`bird_data_wide`) had `NA` values for every Date/Species combination with zero observations. In this case, we need to replace all those `NA` values with 0.
```{r replace-nas}
bird_data_wide[is.na(bird_data_wide)] <- 0
```
We re-run the diversity calculations after converting those `NA` values to zero. We then overwrite `bird_diversity` with a data frame that contains the Date and the results of the diversity calculations:
```{r clean-diversity-02}
# Calculate species diversity on count data
species_counts <- bird_data_wide[, -1]
bird_diversity <- diversity(x = species_counts)
# Add date
bird_diversity <- data.frame(Date = bird_data_wide$Date,
Diversity = bird_diversity)
```
And plot diversity:
```{r plot-diversity}
# Plot species diversity of each date
diversity_plot <- ggplot(data = bird_diversity,
mapping = aes(x = Date, y = Diversity)) +
geom_point()
print(diversity_plot)
```
***
## Starting clean
### Setup
If you want to skip all the data wrangling and start with data that are already in tidy format, download the data [here](https://raw.githubusercontent.com/jcoliver/learn-r/gh-pages/data/sweetwater-data-clean04.csv).
Be sure you set up a new project as described in [Getting started](#getting-started), above.
Move the file you downloaded to the 'data' folder.
In order to save our work, we will use an R script, which is a text file where we place all our R commands. Create a new script through the file menu (File > New File > R Script). We start our script with a header of comments:
```{r clean-01}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
```
We're going to use some R code from packages that do not come with base R. Run these two lines in the console:
```{r clean-02, eval = FALSE}
install.packages("tidyverse")
install.packages("vegan")
```
We also need to tell R in our script that we want to use those packages; we load them into memory with the `library` command. Generally, we try to add all the calls to `library` to the beginning of our code, so if someone opens the script, they can immediately see which additional packages are required.
```{r clean-03}
# Analyze sweetwater bird species richness
# Jeff Oliver
# 2019-11-01
# Load additional packages
library(tidyverse)
library(vegan)
# Read in data
bird_data <- read.csv(file = "data/sweetwater-data-clean04.csv")
```
The last line reads the data into R. You can look at these data by running `head(bird_data)` in the Console.
### Wrangle the data
Next we need to sum the counts for each species for each day:
```{R clean-04}
# Sum species counts for each day
bird_data_sums <- bird_data %>%
filter(ID_to_species == 1) %>%
group_by(Date, Species) %>%
summarise(Count = sum(Count))
```
With the daily totals for each species, now we can calculate species richness (here we are measuring daily richness as the total number of species each day).
### Richness
```{r clean-05}
# Calculate species richness per day
richness <- bird_data_sums %>%
group_by(Date) %>%
summarize(Richness = n())
```
We can investigate the richness data by typing the name of the variable into the console:
```{r, eval = FALSE}
> richness
```
```{r, echo = FALSE}
richness
```
Now that we have species richness calculated for each day, we can use the ggplot2 package, which comes with tidyverse, to plot the data:
```{r clean-06}
# Plot daily species richness
richness_plot <- ggplot(data = richness,
mapping = aes(x = Date,
y = Richness)) +
geom_point()
print(richness_plot)
```
### Diversity
For species diversity, we are going to use, $H$, Shannon's index of species diversity:
$$
H = -\sum_{i=1}^{s}{p_i \ln p_i}
$$
which is the sum over $s$ species, where $p_i$ is the proportion individuals of the $i^th$ species ($n_i$) out of the total number of inviduals ($N$), or:
$$
p_i = \frac{n_i}{N}
$$
We are going to use functions that come with the vegan package to help with these calculations. In order to use those functions, we need to convert the `bird_data_sums` from the "long" format it is now to "wide" format. That is, we need to change from:
| Date | Species | Count |
|:-----------|:-------------------|:-----:|
| 2019-10-04 | Abert's Towhee | 1 |
| 2019-10-04 | Gila Woodpecker | 2 |
| 2019-10-03 | Abert's Towhee | 1 |
| 2019-10-03 | Greater Roadrunner | 1 |
To:
| Date | Abert's Towhee | Gila Woodpecker | Greater Roadrunner |
|:-----------|:------------------:|:---------------:|:------------------:|
| 2019-10-04 | 1 | 2 | 0 |
| 2019-10-03 | 1 | 0 | 1 |
Where
+ each date corresponds to a single row
+ each species corresponds to a single column
We use the `spread` function of tidyverse to accomplish this. We also need to replace NA values produced in this process to zeros:
```{r, clean-07}
# Convert to wide format for species diversity calculation
bird_data_wide <- bird_data_sums %>%
spread(key = Species, value = Count)
# Replace NA values with zeros
bird_data_wide[is.na(bird_data_wide)] <- 0
```
The diversity calculation only works on count data, and our data include one column of dates. We create a new variable, `species_counts` which has all the data except the first column (the Date column), then we calculate species richness and look at the results by running the name of the variable (`bird_diversity`):
```{r clean-08}
# Calculate species diversity on count data
species_counts <- bird_data_wide[, -1]
bird_diversity <- diversity(x = species_counts)
# Add date
bird_diversity <- data.frame(Date = bird_data_wide$Date,
Diversity = bird_diversity)
```
Now that we have species diversity, we can plot it using commands similar to those we used for species richness:
```{r clean-09}
# Plot species diversity of each date
diversity_plot <- ggplot(data = bird_diversity,
mapping = aes(x = Date, y = Diversity)) +
geom_point()
print(diversity_plot)
```
Recall the plot for richness:
```{r clean-10, echo = FALSE}
print(richness_plot)
```
**Question**: Why is species richness of October 4 higher than richness of October 2, but species diversity is _lower_ on October 4 than on October 2?
***
## Additional resources
+ Wickham, H. 2014. Tidy data. _The Journal of Statistical Software_ **59** [http://www.jstatsoft.org/v59/i10/](http://www.jstatsoft.org/v59/i10/)
+ A nice introduction to [calculating measures of species diversity](https://entnemdept.ifas.ufl.edu/hodges/ProtectUs/lp_webfolder/9_12_grade/Student_Handout_1A.pdf)
+ A [PDF version](https://jcoliver.github.io/learn-r/015-wrangling-richness.pdf) of this lesson
***
<a href="index.html">Back to learn-r main page</a>
Questions? e-mail me at <a href="mailto:[email protected]">[email protected]</a>.