Skip to content

Commit

Permalink
feat: added alt text to all images that were lacking it
Browse files Browse the repository at this point in the history
  • Loading branch information
njlyon0 committed Aug 4, 2024
1 parent 96e8d2b commit 1d169c9
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 12 deletions.
6 changes: 3 additions & 3 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,11 @@ The course and its content were developed by a large team. See the [**People pag

::: {#credit-orgs layout-ncol=3}

[![](images/edi-logo.png){width=1.2in}](https://edirepository.org)
[![](images/edi-logo.png){width=1.2in}](https://edirepository.org){fig-alt="Environmental Data Initiative (EDI) logo"}

[![](images/LTER-network-logo.png){width=1.2in}](https://lternet.edu)
[![](images/LTER-network-logo.png){width=1.2in}](https://lternet.edu){fig-alt="Long Term Ecological Research (LTER) Network Office logo"}

[![](images/NEON-NSF-logo.png)](https://neonscience.org)
[![](images/NEON-NSF-logo.png)](https://neonscience.org){fig-alt="National Ecological Observatory Network (NEON) logo"}

:::

Expand Down
12 changes: 6 additions & 6 deletions module2.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -273,13 +273,13 @@ Here our grassland biomass data is in long format, often referred to as "tidy" d
2. Each row is one observation.
3. Each cell contains a single value.

![visual representation of the tidy data structure](images/tidy-data.png)
![visual representation of the tidy data structure](images/tidy-data.png){fig-alt="Image of a table of data in 'tidy' format (i.e., where each column is one variable, each row is one observation, and each cell contains a single value)"}

**Advantages**: clear meaning of rows and columns; ease in filtering/cleaning/appending
**Disadvantages**: not as human-friendly so it can be difficult to assess the data visually
**Possible file formats**: Delimited text (tab delimited shown here), spreadsheets, database tables

![The harmonized grassland data, in long format.](images/long_data_example.png){width="75%"}
![The harmonized grassland data, in long format.](images/long_data_example.png){width="75%" fig-alt="An example of tidy data in long format"}

### Wide (Untidy)

Expand All @@ -289,21 +289,21 @@ In this dataset, our grassland data has been restructured into wide format, ofte
**Disadvantages**: may be more difficult to clean/filter/append, multiple observations per row; more likely to contain empty (NULL) cells
**Possible file formats**: Delimited text (tab delimited shown here), spreadsheets, database tables

![The harmonized grassland data, restructured into wide format with biomass values in control and fertilized columns.](images/wide_data_example.png){width="75%"}
![The harmonized grassland data, restructured into wide format with biomass values in control and fertilized columns.](images/wide_data_example.png){width="75%" fig-alt="An example of tidy data in wide format"}

### Relational Database

Below is an example of how we might structure our grassland data in a relational database. The schema consists of three tables that house information about sampling events (when, where data were collected), the plots from which the samples are collected, and the biomass values for each collection. The schema allows us to define the data types (e.g., text, integer), add constraints (e.g., values cannot be missing), and to describe relationships between tables (keys). Relational formats are [normalized](https://en.wikipedia.org/wiki/Database_normalization) to reduce data redundancy and increase data integrity, which can help us to manage complex data[^13].

![example grassland database schema](images/grassland_schema.drawio.png)
![example grassland database schema](images/grassland_schema.drawio.png){fig-alt="An example of relational data where several tables are linked by a shared column even though they have differing variables and numbers of observations"}

**Advantages**: reduced redundancy, greater integrity; community standard; powerful extensions (e.g., store and process spatial data); many different database flavors to meet specific needs
**Disadvantages**: significant metadata needed to describe and use; more complex to publish; learning curve
**Possible file formats**: Database stores, can be represented in delimited text (CSV)

A richer example is a schematic of the related tables that comprise the [ecocomDP](https://ediorg.github.io/ecocomDP/)[^8] harmonized data format for biodiversity data. Eight tables are defined, along with a set of relationships between tables (keys), and constraints on the allowable values in each table.

![The ecocomDP schema. Each table has a name (top cell) and a list of columns. Shaded column names are primary keys, hashed columns have constraints, and arrows represent relations between keys/constraints in different tables.](images/ecocomDP_schema.jpg){width="75%"}
![The ecocomDP schema. Each table has a name (top cell) and a list of columns. Shaded column names are primary keys, hashed columns have constraints, and arrows represent relations between keys/constraints in different tables.](images/ecocomDP_schema.jpg){width="75%" fig-alt="Another example of a relational set of tables implicitly connected by shared index columns. Curved lines indicate which tables are connected to one another and which columns define that relationship"}

[^8]: O'Brien, Margaret, et al. "ecocomDP: a flexible data design pattern for ecological community survey data." Ecological Informatics 64 (2021): 101374. https://doi.org/10.1016/j.ecoinf.2021.101374
[^13]: Zimmerman, N. 2016. [Hand-crafted relational databases for fun and science](https://carpentries.org/blog/2016/12/hand-crafted-databases/)
Expand All @@ -316,7 +316,7 @@ There are many possibilities to make large synthesis datasets available and usef
**Disadvantages**: less familiar/accessible to many scientists, few best practices to follow, costs can be higher
**Possible file formats**: Parquet files, object storage, distributed/cloud databases

![A few of the cloud-native technologies that might be useful for synthesis research products.](images/cloud_native.png){width="75%"}
![A few of the cloud-native technologies that might be useful for synthesis research products.](images/cloud_native.png){width="75%" fig-alt="A stylized cloud containing the logos for Parquet, S3, PostgreSQL, and Apache Arrow"}

### Other...

Expand Down
5 changes: 2 additions & 3 deletions module3.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,7 @@ Including metadata of this nature makes data more usable, and helps prevent the

[^6]: Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and Stafford, S.G. (1997), NONGEOSPATIAL METADATA FOR THE ECOLOGICAL SCIENCES. Ecological Applications, 7: 330-342. https://doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2

![Example of the normal degradation in information content associated with data and metadata over time ("information entropy"). Accidents or changes in technology (dashed line) may eliminate access to remaining raw data and metadata at any time (Michener et al 1997.](images/michener97_information_loss.png){width=60%}

![Example of the normal degradation in information content associated with data and metadata over time ("information entropy"). Accidents or changes in technology (dashed line) may eliminate access to remaining raw data and metadata at any time (Michener et al 1997.](images/michener97_information_loss.png){width=60% fig-alt="A graphic demonstrating the loss of data and metadata over time with particular events that precipitate more dramatic loss indicated with arrows (e.g., retirement of key personnel, fading memory of those involved, etc."}

#### Data Provenance Metadata

Expand Down Expand Up @@ -242,7 +241,7 @@ There are many, many research data repositories available to researchers now[^11
4. Does the repository charge for publication?
5. **Will the dataset benefit from some level of peer review?**

![A limited slice from the broad spectrum of research data repositories available for publishing synthesis data. These repositories are weighted towards those based in the U.S.A. ([re3data.org](https://www.re3data.org) has a comprehensive list). Also note that the FAIR spectrum below refers primarily to repository requirements. It is possible, but not always required, to include detailed, community-standard metadata in generalist repositories.](images/repository_spectrum.png){width=90%}
![A limited slice from the broad spectrum of research data repositories available for publishing synthesis data. These repositories are weighted towards those based in the U.S.A. ([re3data.org](https://www.re3data.org) has a comprehensive list). Also note that the FAIR spectrum below refers primarily to repository requirements. It is possible, but not always required, to include detailed, community-standard metadata in generalist repositories.](images/repository_spectrum.png){width=90% fig-alt="A graphic containing the logos for many different data repositories arranged along a gradient of 'less FAIR' to 'more FAIR' where FAIRness is defined as 'Metadata/formatting standards'"}

More specialized repositories tend to offer enhanced documentation, custom software tools, and **data curation staff that will review submitted data and assist users with data publication**. Selecting a data repository with metadata requirements or standards, and a review and curation process for submissions, will help ensure that you are publishing a more FAIR data product. Consulting a project data manager if one is available to the synthesis team will also help with repository selection. After making a choice, the process of publishing data varies from repository to repository.

Expand Down

0 comments on commit 1d169c9

Please sign in to comment.