This document includes instructions for running species distribution modeling analyses and creating graphics. If you encounter problems, there are troubleshooting tips available on the Troubleshooting page.
- Install R
- Install RStudio
- Install Git and look at the Git section of the Troubleshooting page.
Aside: What's the difference between R/RStudio & Git/GitHub, anyway?
- Open RStudio and clone the Git repository at https://github.com/jcoliver/biodiversity-sdm-lesson.git. If you're not sure how to do this, take a look at point #3 "Cloning a repository" in the Git section of the Troubleshooting document.
- Run the setup script in RStudio by running this command in the Console tab of RStudio:
source(file = "scripts/setup.R")
This script may take a while to run, depending on the speed of your machine and
internet connection. Note also that it may ask you if you want to restart R
before installing/upgrading packages; if you receive this prompt, answer
Yes to restarting R. This script installs additional R packages necessary
for analyses, makes sure the data
folder exists, and downloads climate data
necessary to run the species distribution models. If you are prompted to choose
a CRAN mirror, select the mirror that is geographically closest to you. If this
script fails (or produces errors), see point #3 in the R section of the
Troubleshooting page.
- You will need to download occurrence data for both butterfly and one of its host plants. To do so, go to iNaturalist and search for one of the two species.
- Click on the Filters button to the right of the search bar near the top of the screen. Here you can change the filters to sort your observations to only include verifiable or research grade (note these will affect the number of observations you have to work with). Be sure to record any filters you place on your search.
- Click the Download button in the lower right hand area of this pop up window. This will bring you to a new screen with many different options. At the very least you should see the species you searched for in the Taxon box about 1/3 of the way down the screen. Scroll through this page and select the options you would like. Importantly, the more options you choose in 3. Choose Columns, the longer it will take to download your data. You should look carefully and think about the data that might be useful to you when considering your results. At the very least, you must have the latitude and longitude columns checked as these are necessary for generating maps in the next part of the project.
- Once you have made your selections, click Create export. This may take a few minutes depending on how much data you downloaded.
- Save the file as a csv file in the data folder within the biodiversity-sdm-lesson folder you downloaded from Git through RStudio. Rename the file "<genus_species>data.csv" (replace <genus_species> with the appropriate names; for example, if you downloaded data for Papilio cresphontes, the file should be called "Papilio_cresphontes_data.csv""). Note: there can be no spaces in your file names - use an underscore ("") any place you would otherwise want to put a space.
- Repeat for the second species of interest. For example, if we download data for Zanthoxylum americanum (a host plant of P. cresphontes), save the file as "Zanthoxylum_americanum_data.csv".
- In the scripts directory, copy the file run-sdm-single.R and rename the copy "-sdm-single.R", replacing "" with the name of the butterfly species. Use underscores instead of spaces; so for the species Papilio cresphontes, the file name would be "Papilio_cresphontes-sdm-single.R".
- Open this new file and update the following values:
butterfly.data.file <- "data/BUTTERFLY_DATA.csv"
Change"BUTTERFLY_DATA.csv"
so it matches the file of butterfly data you saved in Setup, step 3. For example, if we are analyzing the P. cresphontes data, this line becomes:butterfly.data.file <- "data/Papilio_cresphontes_data.csv"
outprefix <- "MY_SPECIES"
Replace"MY_SPECIES"
with the name of the butterfly species. Use underscores instead of spaces; so for the species P. cresphontes, the line would read:outprefix <- "Papilio_cresphontes"
- Save the file with these updates
- Run the analyses by typing the following command in the Console tab of
RStudio:
source(file = "scripts/<species>-sdm-single.R")
, replacing<species>
with the species name as in step 1 of Running analyses. For the P. cresphontes analysis, we would thus type the command:source(file = "scripts/Papilio_cresphontes-sdm-single.R"
. After running this script, a map will be saved in the output folder; the file name will start with the value you used for "MY_SPECIES" in step 2.3, above, and end with "-single-prediction.pdf". So for the example P. cresphontes, the output pdf file will be at "output/Papilio_cresphontes-single-prediction.pdf".
- In the scripts directory, copy the file run-sdm-pairwise.R and rename the copy "-sdm-pairwise.R", replacing "" with the name of the butterfly species. Use underscores instead of spaces; so for the species Papilio cresphontes, the file name would be "Papilio_cresphontes-sdm-pairwise.R".
- Open this new file and update the following values:
butterfly.data.file <- "data/BUTTERFLY_DATA.csv"
Change"BUTTERFLY_DATA.csv"
so it matches the file of butterfly data you saved in Setup, step 3. For example, if we are analyzing the P. cresphontes data, this line becomes:butterfly.data.file <- "data/Papilio_cresphontes_data.csv"
plant.data.file <- "data/PLANT_DATA.csv"
Change"PLANT_DATA.csv"
so it matches the file of plant data you saved in Setup, step 4. For example, if we are using Z. americanum as P. cresphontes' host species, this line becomes:plant.data.file <- "data/Zanthoxylum_americanum_data.csv"
outprefix <- "MY_SPECIES"
Replace"MY_SPECIES"
with the name of the butterfly species. Use underscores instead of spaces; so for the species P. cresphontes, the line would read:outprefix <- "Papilio_cresphontes"
- Save the file with these updates
- Run the analyses by typing the following command in the Console tab of
RStudio:
source(file = "scripts/<species>-sdm-pairwise.R")
, replacing<species>
with the species name as in step 1 of Running analyses. For the P. cresphontes analysis, we would thus type the command:source(file = "scripts/Papilio_cresphontes-sdm-pairwise.R"
. After running this script, two things to note:- In the console you should see the % of the modeled plant's range that is occupied by the insect. Comparing this to the map, the value is the fraction of the area that is red, relative to the total red and green areas.
- A map will be saved in the
output
folder; the file name will start with the value you used for "MY_SPECIES" in step 2.3, above, and end with "-pairwise-prediction.pdf". So for the example P. cresphontes, the output pdf file will be at "output/Papilio_cresphontes-pairwise-prediction.pdf".
- In the scripts directory, copy the file run-future-sdm-single.R and rename the copy "-future-sdm-single.R", replacing "" with the name of the butterfly species. Use underscores instead of spaces; so for the species Papilio cresphontes, the file name would be "Papilio_cresphontes-future-sdm-single.R".
- Open this new file and update the following values:
butterfly.data.file <- "data/BUTTERFLY_DATA.csv"
Change"BUTTERFLY_DATA.csv"
so it matches the file of butterfly data you saved in Setup, step 3. For example, if we are analyzing the P. cresphontes data, this line becomes:butterfly.data.file <- "data/Papilio_cresphontes_data.csv"
outprefix <- "MY_SPECIES"
Replace"MY_SPECIES"
with the name of the butterfly species. Use underscores instead of spaces; so for the species P. cresphontes, the line would read:outprefix <- "Papilio_cresphontes"
- Save the file with these updates
- Run the analyses by typing the following command in the Console tab of
RStudio:
source(file = "scripts/<species>-future-sdm-single.R")
, replacing<species>
with the species name as in step 1 of Running analyses. For the P. cresphontes analysis, we would thus type the command:source(file = "scripts/Papilio_cresphontes-future-sdm-single.R"
. After running this script, a map will be saved in theoutput
folder; the file name will start with the value you used for "MY_SPECIES" in step 2.3, above, and end with "-single-future-prediction.pdf". So for the example P. cresphontes, the output pdf file will be at "output/Papilio_cresphontes-single-future-prediction.pdf".
- In the scripts directory, copy the file run-future-sdm-pairwise.R and rename the copy "-future-sdm-pairwise.R", replacing "" with the name of the butterfly species. Use underscores instead of spaces; so for the species P. cresphontes, the file name would be "Papilio_cresphontes-future-sdm-pairwise.R".
- Open this new file and update the following values:
butterfly.data.file <- "data/BUTTERFLY_DATA.csv"
Change"BUTTERFLY_DATA.csv"
so it matches the file of butterfly data you saved in Setup, step 3. For example, if we are analyzing the P. cresphontes data, this line becomes:butterfly.data.file <- "data/Papilio_cresphontes_data.csv"
plant.data.file <- "data/PLANT_DATA.csv"
Change"PLANT_DATA.csv"
so it matches the file of plant data you saved in Setup, step 4. For example, if we are using Z. americanum as P. cresphontes' host species, this line becomes:plant.data.file <- "data/Zanthoxylum_americanum_data.csv"
outprefix <- "MY_SPECIES"
Replace"MY_SPECIES"
with the name of the butterfly species. Use underscores instead of spaces; so for the species P. cresphontes, the line would read:outprefix <- "Papilio_cresphontes"
- Save the file with these updates
- Run the analyses by typing the following command in the Console tab of
RStudio:
source(file = "scripts/<species>-future-sdm-pairwise.R")
, replacing<species>
with the species name as in step 1 of Running analyses. For the P. cresphontes analysis, we would thus type the command:source(file = "scripts/Papilio_cresphontes-future-sdm-pairwise.R"
. After running this script, two things to note:- In the console you should see the % of the modeled plant's range that is occupied by the insect. Comparing this to the map, the value is the fraction of the area that is red, relative to the total red and green areas.
- A map will be saved in the
output
folder; the file name will start with the value you used for "MY_SPECIES" in step 2.3, above, and end with "-pairwise-future-prediction.pdf". So for the example P. cresphontes, the output pdf file will be at "output/Papilio_cresphontes-pairwise-future-prediction.pdf".
So what's all this talk of Git and GitHub? How are they different? Aren't they just the same thing? And what about R and RStudio? Do I need both of them for this stuff to work?
- Git vs GitHub: In short, Git is a piece of software that keeps track of versions, much like the "Track changes" option for your favorite word processing program. GitHub is a website. That's it. Well, it's a website that has Git running in the background and allows you to collaborate with other folks. There are other websites like GitHub that also use Git, including Bitbucket and GitLab, but the R code for this project all lives on GitHub. In this project, Git is how your computer talks with the code that is stored on GitHub. And RStudio makes that communication that much easier...
- R vs RStudio: R is a programming language that we use to analyze data and produce graphics. RStudio is a piece of software that we use to interact with R. You don't actually need RStudio to run the analyses and produce the maps described above. However, the RStudio program does make it much easier to interact with GitHub. It is also a nicer experience than working directly with the R programming language, especially for those with little to no programming experience.
Back to the top