Summary of files in this repository
This repository contains data and code used in Bernhardt et al. (2022). The preparation and analysis of this dataset consisted of a four part workflow:
- Part I: Standardized dataset creation: Part I is the preparation of standardized datasets of stream metabolism from the StreamPULSE data portal and terrestrial ecosystem fluxes from the FLUXNET2015 dataset. Everything generated in Part I is included in
output_data
so that other users could use this larger dataset if they choose to work with a different subset of data than we used. - Part II. Filtering, gap-filling, and calculating metrics: In Part II the standardized dataset of lotic and terrestrial metabolism data was filtered down to the subset of sites used in Bernhardt et al. (2022). After filtering, additional descriptive metrics were calculated to use in analysis.
- Part III. Plotting and export of stats dataset: Datasets from part II are minimally subsetted and recast for plotting convenience. Figures 1-6 are generated. An analysis-ready dataset is compiled.
- Part IV. Structural equation modeling: Code used for structural equation model (SEM) analysis on annual metabolism dataset. We used an observed variables model to estimates the effect of light (PAR reaching the stream surface) and hydrologic variability (skewness of daily discharge) on annual river GPP. bles model to estimates the effect of light (PAR reaching the stream surface) and hydrologic variability (skewness of daily discharge) on annual river GPP.
The repository contains:
- A documentation of each parts of the workflow, located in the documentation folder of this repository. The original R markdown files are also included in the root directory of the repository.
- An associated package called BernhardtMetabolism, which contains data and code used to produce Part II of the workflow
- Final data outputs
Key: Exported datasets "Column name"
- Part-I-Standardized-dataset-creation.html : Documentation of Part I of the workflow. This document only contains a description of the data, methods, and and steps used to complete Part I of the workflow.
- Part-II-Filtering-and-gap-filling.html: Documentation of Part II of the workflow. This docoument, along with the associated R markdown file (Part II-Filtering and gap filling.Rmd) goes through the entire set of steps used to complete Part II of the workflow. In conjunction with the BernhardtMetabolism package, this document contains all of the data and code to fully reproduce Part II of the workflow.
A final set of outputs were then exported to the output_data directory of this repository.
This is the full, unfiltered, standardized dataset of stream metabolism that was compiled.
lotic_site_info_full.rds
Format: A single data frame with the following columns:
- "Site_ID": Unique site identifier
- "Name": Site long name
- "Source": Site data source
- "Lat": Site Latitude
- "Lon": Site Longitude
- "epsg_crs": Site coordinate reference system as EPSG code
- "COMID": NHDPlus_v2 reach COMID
- "VPU": Vector processing unit
- "StreamOrde": Stream order (from NHDPlus_v2 flowlines)
- "Azimuth": Channel azimuth calculated as the circular mean of azimuths for each stream reach based on latitude and longitude of the site location using NHDPlus_v2 hydrography data.
- "TH": Tree height (m). Estimates were derived from Tree heights were derived using 30m resolution global canopy height estimates from Potapov et al. (2021)
- "Width": Channel width (m)
- "Width_src": Source of channel width estimates. Values include: (NWIS field measurements, Regional geomorphic scaling coeff, StreamPULSE estimates)
- "WS_area_km2": Watershed area (km ^-2^)
- "WS_area_src": Source of watershed area estimates. Values include: (Appling2018_USGS2013, Appling2018_StreamStats, nwis_site_description, StreamStats, localuser_HBFLTER, localuser_UNHWQAL)
lotic_standardized_full.rds
Format: A list of data frames, where each element of the list is a data frame of timeseries for a single site. The names of each list element correspond to the unique site identifier (Site_ID) for a site. Each data frame contains the following columns:
- "Site_ID": Unique site identifier
- "Date": Date in YYYY-MM-DD format
- "U_ID": Unique date identifier (format as year + DOY)
- "Year": Year
- "DOY": Day of year (1 to 365 or 366)
- "GPP_raw": Stream GPP estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), raw data
- "ER_raw": Stream ER estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), raw data
- "GPP": Stream GPP estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), negative values replaced with NA
- "ER": Stream ER estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), positive values replaced with NA
- "K600": Model estimate of K600, the mean reaeration rate coefficient, scaled to a Schmidt number of 600, for this date. Value is the median of the post warmup MCMC distribution
- "DO.obs": Mean dissolved oxygen concentration (mg O2 L^-1^) for the date (4am to 3:59am)
- "DO.sat": Mean theoretical saturation concentration (mg O2 L^-1^) for the date (4am to 3:59am)
- "temp.water": Mean water temperature (degreees C) for the date (4am to 3:59pm)
- "discharge": Mean discharge (m^3^ s^-1^) for the date (4am to 3:59pm)
- "PAR_sum": Daily sum of incoming above canopy PAR (mol m^-2^ d^-1^)
- "Stream_PAR_sum": Daily sum of PAR estimated at the stream surface (mol m^-2^ d^-1^)
- "LAI_proc": MODIS LAI data that has been processed and gap-filled (m^2^ m^-2^)
This is the dataset of stream metabolism used in Bernhardt et al. (2022). This data has been filtered based on several measures of data quality and availability and gaps were gap-filled. Finally, a suite of site metrics were calculated for use in analysis.
lotic_site_info_filtered.rds
Format: A single data frame with the following columns:
- "Site_ID": Unique site identifier
- "Name": Site long name
- "Source": Site data source
- "Lat": Site Latitude
- "Lon": Site Longitude
- "epsg_crs": Site coordinate reference system as EPSG code
- "COMID": NHDPlus_v2 reach COMID
- "VPU": Vector processing unit
- "StreamOrde": Stream order (from NHDPlus_v2 flowlines)
- "Azimuth": Channel azimuth calculated as the circular mean of azimuths for each stream reach based on latitude and longitude of the site location using NHDPlus_v2 hydrography data.
- "TH": Tree height (m). Estimates were derived from Tree heights were derived using 30m resolution global canopy height estimates from Potapov et al. (2021)
- "Width": Channel width (m)
- "Width_src": Source of channel width estimates. Values include: (NWIS field measurements, Regional geomorphic scaling coeff, StreamPULSE estimates)
- "WS_area_km2": Watershed area (km ^-2^)
- "WS_area_src": Source of watershed area estimates. Values include: (Appling2018_USGS2013, Appling2018_StreamStats, nwis_site_description, StreamStats, localuser_HBFLTER, localuser_UNHWQAL)
- "ann_GPP_C": Mean annual cumulative stream GPP (g C m^-2^ y^-1^). This was calculated by first calculating annual sums of GPP (g C m^-2^ y^-1^) for each site year, and then taking the mean annual rate for each site.
- "upper_GPP_C": 95th percentile of daily rates of stream GPP (g C m^-2^ d^-1^).
- "ann_ER_C": Mean annual cumulative stream ER (g C m^-2^ y^-1^). This was calculated by first calculating annual sums of ER (g C m^-2^ y^-1^) for each site year, and then taking the mean annual rate for each site.
- "lower_ER_C": 5th percentile of daily rates of stream ER (g C m^-2^ d^-1^). Since ER is negative, you can think of this as equivalent to the 95th percentile done for GPP.
- "PAR_sum": Mean annual cumulative incoming PAR (kmol m^-2^ y^-1^). This was calculated by first calculating annual sums of PAR (kmol m^-2^ y^-1^) for each site year, and then taking the mean annual rate for each site.
- "Stream_PAR_sum": Mean annual cumulative predicted PAR at the stream surface (kmol m^-2^ y^-1^). This was calculated by first calculating annual sums of predicted PAR at the stream surface (kmol m^-2^ y^-1^) for each site year, and then taking the mean annual rate for each site.
- "gpp_C_mean": Mean daily GPP (g C m^-2^ d^-1^)
- "gpp_C_cv": CV of daily GPP
- "gpp_C_skew": Skewness of daily GPP
- "gpp_C_kurt": Kurtosis of daily GPP
- "gpp_C_amp": Amplitude of daily GPP
- "gpp_C_phase": Phase of daily GPP (day of year)
- "gpp_C_ar1": Autoregressive lag-one correlation coefficient of daily GPP
- "er_C_mean": Mean daily ER (g C m^-2^ d^-1^)
- "er_C_cv": CV of daily ER
- "er_C_skew": Skewness of daily ER
- "er_C_kurt": Kurtosis of daily ER
- "er_C_amp": Amplitude of daily ER
- "er_C_phase": Phase of daily ER (day of year)
- "er_C_ar1": Autoregressive lag-one correlation coefficient of daily ER
- "Wtemp_mean Mean daily water temperature
- "Wtemp_cv": CV of daily GPP
- "Wtemp_skew": Skewness of daily water temperature
- "Wtemp_kurt": Kurtosis of daily water temperature
- "Wtemp_amp": Amplitude of daily water temperature
- "Wtemp_phase": Phase of daily water temperature (day of year)
- "Wtemp_ar1": Autoregressive lag-one correlation coefficient of daily water temperature
- "Disch_mean": Mean daily discharge
- "Disch_cv": CV of daily discharge
- "Disch_skew": Skewness of daily discharge
- "Disch_kurt": Kurtosis of daily discharge
- "Disch_amp": Amplitude of daily discharge
- "Disch_phase": Phaste of daily discharge (day of year)
- "Disch_ar1": Autoregressive lag-one correlation coefficient of daily discharge
- "PAR_mean": Mean daily PAR
- "PAR_cv": CV of daily PAR
- "PAR_skew": Skewness of daily PAR
- "PAR_kurt": Kurtosis of daily PAR
- "PAR_amp": Amplitude of daily PAR
- "PAR_phase": Phase of daily PAR (day of year)
- "PAR_ar1": Autoregressive lag-one correlation coefficient of daily PAR
- "LAI_mean": Mean daily LAI (m^2^ m^-2^)
- "LAI_cv": CV of LAI
- "LAI_skew": Skewness of LAI
- "LAI_kurt": Kurtosis of LAI
- "LAI_amp": Amplitude of LAI
- "LAI_phase": Phase of LAI (day of year)
- "LAI_ar1": Autoregressive lag-one correlation coefficient of daily LAI
- "MOD_ann_NPP": Mean annual MODIS NPP (g C m^-2^ y^-1^) for the concurrent period of record for stream metabolism data at a site. Annual sums of NPP (g C m^-2^ d^-y^) were available for each site year and then the mean was taken to get a mean annual rate for each site.
- "ndays": Total number of days with daily GPP (non gap-filled) for the site in the filtered dataset
- "nyears": Total number of years for the site in the filtered dataset
- "coverage": Total coverage of daily GPP (non gap-filled) for the site, calculated as ndays / all possible days for all site-years included in the filtered dataset. Ranges from 0-1.
lotic_gap_filled.rds
Format: A list of data frames, where each element of the list is a data frame of timeseries for a single site. The names of each list element correspond to the unique site identifier (Site_ID) for a site. Each data frame contains the following columns:
- "Site_ID": Unique site identifier
- "Date": Date in YYYY-MM-DD format
- "U_ID": Unique date identifier (format as year + DOY)
- "Year": Year
- "DOY": Day of year (1 to 365 or 366)
- "GPP_raw": Stream GPP estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), raw data
- "ER_raw": Stream ER estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), raw data
- "GPP": Stream GPP estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), negative values replaced with NA
- "ER": Stream ER estimates (g O2 m^-2^ d^-1^) from Appling et al. (2018), positive values replaced with NA
- "K600": Model estimate of K600, the mean reaeration rate coefficient, scaled to a Schmidt number of 600, for this date. Value is the median of the post warmup MCMC distribution
- "DO.obs": Mean dissolved oxygen concentration (mg O2 L^-1^) for the date (4am to 3:59am)
- "DO.sat": Mean theoretical saturation concentration (mg O2 L^-1^) for the date (4am to 3:59am)
- "temp.water": Mean water temperature (degreees C) for the date (4am to 3:59pm)
- "discharge": Mean discharge (m^3^ s^-1^) for the date (4am to 3:59pm)
- "PAR_sum": Daily sum of incoming above canopy PAR (mol m^-2^ d^-1^)
- "Stream_PAR_sum": Daily sum of PAR estimated at the stream surface (mol m^-2^ d^-1^)
- "LAI_proc": MODIS LAI data that has been processed and gap-filled (m^2^ m^-2^)
- "GPP_filled": Gap-filled stream GPP estimates (g O2 m^-2^ d^-1^)
- "ER_filled": Gap-filled stream ER estimates (g O2 m^-2^ d^-1^)
- "NEP_filled": Gap-filled stream NEP estimates (g O2 m^-2^ d^-1^)
- "GPP_C_filled": Gap-filled stream GPP estimates, expressed in carbon (g C m^-2^ d^-1^)
- "ER_C_filled": Gap-filled stream ER estimates, expressed in carbon (g C m^-2^ d^-1^)
- "NEP_c_filled": Gap-filled stream NEP estimates, expressed in carbon (g C m^-2^ d^-1^)
- "Wtemp_filled": Gap-filled mean water temperature from the "temp.water" column.
- "Disch_filled": Gap-filled mean discharge from the "discharge" column
- "PAR_filled": Gap-filled daily sum of incoming above canopy PAR, from the "PAR_sum column
- "GPP_norm": Z-normalized GPP
- "ER_norm": Z-normalized ER
- "NEP_norm": Z-normalized NEP
- "Wtemp_norm": Z-normalized water temperature
- "PAR_norm": Z-normalized daily sum of incoming above canopy PAR
This is the dataset of terrestrial ecosystem fluxes used in Bernhardt et al. (2022). This data has been filtered based on data availability and several basic site metrics were calculated for use in analysis.
fluxnet_site_info_filtered.rds
Format: A single data frame with the following columns:
- "Site_ID": Unique site identifier
- "Name": Site long name
- "Lat": Site Latitude
- "Lon": Site Longitude
- "ann_GPP": Mean annual cumulative GPP (g C m^-2^ y^-1^). This was calculated from the annual sums of GPP (g C m^-2^ y^-1^) for each site year provided by FLUXNET, and then taking the mean annual rate for each site.
- "upper_GPP": 95th percentile of daily rates of GPP (g C m^-2^ d^-1^).
- "ann_ER": Mean annual cumulative ER (g C m^-2^ y^-1^). This was calculated from the annual sums of ER (g C m^-2^ y^-1^) for each site year provided by FLUXNET, and then taking the mean annual rate for each site
- "lower_ER": 5th percentile of daily rates of GPP (g C m^-2^ d^-1^). Since ER is negative, you can think of this as equivalent to the 95th percentile done for GPP.
- "ndays": Total number of days with daily GPP (non gap-filled) for the site in the filtered dataset
- "nyears": Total number of years for the site in the filtered dataset
- "coverage": Total coverage of daily GPP (non gap-filled) for the site, calculated as ndays / all possible days for all site-years included in the filtered dataset. Ranges from 0-1.
fluxnet_filtered_metabolism.rds
Format: A list of data frames, where each element of the list is a data frame of timeseries for a single site. The names of each list element correspond to the unique site identifier (Site_ID) for a site. Each data frame contains the following columns:
- "Date": Date in YYYY-MM-DD format
- "U_ID": Unique date identifier (format as year + DOY)
- "Year": Year
- "DOY": Day of year (1 to 365 or 366)
- "GPP_raw": FLUXNET annual GPP (sum from daily estimates) (g C m^-2^ y^-1^) "GPP_NT_VUT_REF", raw data. Gross Primary Production, from Nighttime partitioning method, reference version selected from GPP versions using a model efficiency approach.
- "ER_raw": FLUXNET annual ER (sum from daily estimates) (g C m^-2^ d^-y^) "RECO_NT_VUT_REF", raw data. Ecosystem Respiration, from Nighttime partitioning method, reference selected from RECO versions using a model efficiency approach.
- "GPP": FLUXNET annual GPP (sum from daily estimates) (g C m^-2^ y^-1^) "GPP_NT_VUT_REF", negative values replaced with NA. Gross Primary Production, from Nighttime partitioning method, reference version selected from GPP versions using a model efficiency approach.
- "ER": FLUXNET annual ER (sum from daily estimates) (g C m^-2^ d^-y^) "RECO_NT_VUT_REF", positive values replaced with NA. Ecosystem Respiration, from Nighttime partitioning method, reference selected from RECO versions using a model efficiency approach.
- "Net": FLUXNET NEE (changed to NEP by * -1) "NEE_VUT_REF". I derived this from data of this description:Net Ecosystem Exchange, using Variable Ustar Threshold (VUT) for each year, reference selected on the basis of the model efficiency.
- "Temp": Average air temperature from daily data (degrees C)
- "Precip": Average precipition from daily data (mm)
- "VPD": Average vapor pressure deficit from daily data (hPa)
- "SW": Average incoming shortwave radiation from daily data (W m^-2)