Skip to content

DwC vs DwC A

Erica Krimmel edited this page Jul 2, 2018 · 1 revision

DwC data standard vs DwC Archive data sharing package

Question: What is the difference between the Darwin Core Standard and a Darwin Core Archive?

Quick answer: the Darwin Core Standard is a list of terms, while the Darwin Core Archive is a package, i.e. an agreed upon format for all the compiled data and metadata.

Long answer: Darwin Core is a data standard for publishing and integrating biodiversity information (Wieczorek et al. 2012). Loosely speaking, it is a group of terms the broader community agrees to use when mobilizing biodiversity data with the outside world, or between collectors/collections. Some examples of DwC standard terms include dwc:geodeticDatum, dwc:habitat, dwc:eventDate. Each term in the standard has a definition that explains the concept for the term to help us know what field in our own dataset is the best match. Perhaps in your local database, you have a field for collectingDate that is expected to be verbatim (that is, as provided on the specimen label). When you export your data in a file using Darwin Core standard terms, you would re-name this field dwc:verbatimEventDate. The process of figuring out the match between your local data source and the most appropriate Darwin Core term is called mapping. It may be obvious to you now that you've read this, but the mapping step is just the first one that must be done before everyone can add their data to a common data aggregation such that you might find in VertNet, GBIF, iDigBio, Canadensys, ALA, etc.

Once your biodiversity data is mapped to the Darwin Core standard it can be shared or published and successfully aggregated and integrated with other biodiversity data. But now we have an analogous challenge to the one above. If we are to aggregate data from different sources, we must figure out what the data package looks like that we are going to share. Humans and computers need to be able to understand the data. A (sort-of okay) example might be that when we address an envelope, we know that line 1 is the name of the person / organization, line 2 is the street address, and line 3 is the city, state, and all-important zip code. And, we always provide a return address so the humans and computers can figure out the provenance of the package. This standardization on the package label allows machine sorting and routing. Today, for mailing some packages, we are also required to declare what is inside the package.

Similarly, before we can publish our mapped data, we need a standard data sharing format to tell both humans and computers where the data comes from, and what is inside the data package. The Biodiversity Data Standards (TDWG) community, in concert with GBIF, designed and defined such a data package and named it the Darwin Core Archive (DwC-A).

Question: So, what is inside a DwC-A?

Quick answer: Multiple files. You would find at least: 1) your data file containing information about your museum specimens or field observations, 2) an index file that tells the computer what columns (standard terms) are present in your data file, and 3) a metadata file that contains information about who is providing this data (your organization and contact information, for example).

Long answer, adapted from the Darwin Core Archive Assistant: Darwin Core Archive (DwC-A) is a Biodiversity informatics data packaging standard that makes use of the Darwin Core terms to produce a single, self contained dataset for species occurrence or taxonomic (species) data. It is the preferred format for publishing data to GBIF. You export your data as a set of one or more text (CSV) files. A simple XML descriptor file (called meta.xml) is required to inform others how your files are organized.

Links to other explanations

Citations

  • Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. https://doi.org/10.1371/journal.pone.0029715
Clone this wiki locally