This repository contains work for my M.Sc. on the identification of beta diversity hotspots using species distribution models (SDMs). The results are part of a manuscript available as a preprint on EcoEvoRxiv. There is also a specific repository for the manuscript.
This project is implemented in Julia v1.6.1. The required packages and versions are listed in Project.toml
. To install them, run the first lines of src/required.jl. Some steps are also implemented in R v4.1.0, with packages & versions tracked by renv
. More details below.
The data used in this project comes from the eBird Basic Dataset from June 2019. The project is for now focused on all warblers species (Parulidae family) in North America (CA, US, MX).
eBird Basic Dataset. Version: EBD_relJun-2019. Cornell Lab of Ornithology, Ithaca, New York. Jun 2019.
Note however that the data is not hosted in this remote repository due to size limitations.
The repository is organized as follows:
-
assets/
contains the pre-coarsened Copernicus land cover data (downloaded and coarsened insrc/00c_data_landcover-copernicus.jl
). -
data/
is used to store the data.jld2/
contains exported Julia.jld2
elements, such as SDM predictions. Earlier versions relied heavily on these, but since they were too large to be version controlled, they have now been replaced by raster files inraster/
. Some.jld2
are still exported here and there but are not central to the analyses.proc/
contains processed CSV data. Importantly, the prepared eBird data and some BART predictions are locally stored here but are not version controlled due to their size.raster/
contains raster files with species distributions (observed and predicted) and environmental data layers. Raster files are now central to the workflow and are used to save and reload data between scripts.raw/
contains the raw CSV datasets from eBird (not version-controlled).rdata/
contains.RData
files used as backups in the R scripts, which are not essential and are not version controlled.
-
fig/
contains the figures produced, organized by outcome (bart
for figures based on predicted data andraw
for figures based on observed data). -
src/
contains all the scripts used in the project. Ordered scripts in this directory represent the main steps of the analyses. Subfolders contain scripts with a more specific use.lib/
is the library of the custom functions used in the main scripts.others/
contains useful scripts that are not part of the main analysesshell/
contains Bash scripts used for some operations.
All analysis scripts are in src/
.
main.jl
can be used to run all the analyses and produce the figures.required.jl
loads all the required packages and library functions.
Else, the general workflow of the analyses is as follows:
-
00a_ebd_extraction.jl
extracts the Warblers data from the complete EBD todata/raw
(not version controlled). -
00b_ebd_preparation.jl
prepares the Warblers data indata/raw
for the analyses, then saves the results indata/proc
(not version-controlled) -
00c_landcover.jl
prepares the landcover data from Copernicus and exports the environmental data as CSVs indata/proc
and as TIFF files indata/raster
. -
01_distributions
assembles the species distributions from the raw data as layers, then exports these todata/raster
. It also produces examples of single species maps. -
02_training_bart.R
trains BARTs (Bayesian Additive Regression Trees) in R (packageembarcadero
) based on the distribution and environmental rasters, then predicts the species distributions (exported as CSV files, which are not version controlled). -
03_predictions_bart.jl
assembles the predicted distributions as layers and exports them as raster files. -
04_full-extent.jl
performs the main analysis steps (on the full spatial extent): getting species richness and LCBD values per site and verifying the relationship between the two. These steps can be performed on either the observed or predicted distributions. -
05_subareas.jl
reapplies the analyses on smaller regions and investigates the effect of the spatial scale on the results. -
06_moving-windows.jl
investigates the effect of the proportion of rare species on the relationship between species richness and LCBD values at varying scales. -
07_comparison_data.jl
re-runs the main analysis steps on both the observed and predicted data and prepares the results for comparison in the following scripts. -
08_comparison_glm.jl
performs GLMs in R to compare the observed and predicted results and saves the results to be plotted in the next script. -
09_comparison_plots.jl
produces plots comparing the observed and predicted results. The comparison is made by comparing the results directly (called difference plots) or the GLM residuals produced in the previous script (called residual plots).
This code is built around the package SimpleSDMLayers.jl and its SimpleSDMLayer
types, which are used to store the environmental variables and the species distributions.
-
analysis.jl
contains the functions to perform the main analyses. -
bart.R
contains utility functions for the BART analyses. -
betadiv.jl
contains functions to compute beta diversity statistics. -
csvdata.jl
contains functions to prepare the data extracted from CSV files. -
landcover.jl
contains functions to extract and prepare the landcover data (similarly to the other data sources inSimpleSDMLayers.jl
). -
plotting.jl
contains a function to allow easier plotting of theSimpleSDMLayer
type elements. -
presence-absence.jl
contains the function to convert the raw data into a presence-absence layer. -
shapefiles.jl
contains a function to download the background shapefiles for plotting, and a function to clip them so that they overlap with aSimpleSDMLayer
. -
version-control.jl
contains the list of important files which are too large to be version controlled and a set of custom functions to track their changes. See additional notes for details.
- For each important file that is too large to be version controlled, a version-controlled placeholder file was created (for example
data/proc/ebd_warblers_prep_placeholder.csv
) to record the time where the large file was last updated on purpose. The placeholder is updated when the files are changed, and the functions will trigger a warning prompting to make sure the change was made on purpose. If it was, the placeholder should be re-committed with the new modification time. If the file was overwritten without a change (and the user is sure of it), the placeholder change can be discarded. - This project is based on previous proof of concept by @tpoisot, my M.Sc. advisor, at https://gitlab.com/tpoisot/BioClim.