Skip to content

Commit

Permalink
Update change log and docs for v7 release
Browse files Browse the repository at this point in the history
  • Loading branch information
huddlej committed May 27, 2021
1 parent 115cb99 commit fba1fac
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/change_log.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
As of April 2021, we use major version numbers (e.g. v2) to reflect backward incompatible changes to the workflow that likely require you to update your Nextstrain installation.
We also use this change log to document new features that maintain backward compatibility, indicating these features by the date they were added.

## v7 (27 May 2021)

### Major changes

- Support default (full) GISAID metadata and sequences from the "Download packages" interface by converting this default format into Nextstrain-compatible metadata and sequences. By default, the workflow will now deduplicate metadata and sequences from each `inputs` dataset at the beginning and standardized all metadata column names to lowercased strings with underscores replacing whitespace. For more details, [see the configuration reference for the new "sanitize metadata" parameters](https://nextstrain.github.io/ncov/configuration.html#sanitize_metadata). ([#640](https://github.com/nextstrain/ncov/pull/640))

### Features

- Support reading metadata and sequences directly from GISAID's tar archives. For example, you can now define `inputs` as `metadata: data/ncov_north-america.tar.gz` and `sequences: data/ncov_north-america.tar.gz` to decompress and read the corresponding data from the archive. ([#640](https://github.com/nextstrain/ncov/pull/640))

## New features since last version update

- 25 May 2021: Support custom Auspice JSON prefixes with a new configuration parameter, `auspice_json_prefix`. [See the configuration reference for more details](https://nextstrain.github.io/ncov/configuration.html#auspice_json_prefix). ([#643](https://github.com/nextstrain/ncov/pull/643))
Expand Down
19 changes: 19 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,25 @@ Valid attributes for list entries in `inputs` are provided below.
* description: A list of prefixes to strip from strain names in metadata and sequence records to maintain consistent strain names when analyzing data from multiple sources.
* default: `["hCoV-19/", "SARS-CoV-2/"]`

## sanitize_metadata
* type: object
* description: Parameters to configure how to sanitize metadata to a Nextstrain-compatible format.

### parse_location_field
* type: string
* description: Field in the metadata that stores GISAID-formatted location details (e.g., `North America / USA / Washington`) to be parsed into `region`, `country`, `division`, and `location` fields.
* default: `Location`

### rename_fields
* type: array
* description: List of key/value pairs mapping fields in the input metadata to rename to another value in the sanitized metadata.
* default: `["Virus name=strain", "Collection date=date"]`

### standardize_columns
* type: boolean
* description: Standardize column names by lowercasing and replacing all whitespace with underscores. This operation happens after renaming fields.
* default: `true`

## subsampling
* type: object
* description: Schemes for subsampling data prior to phylogenetic inference to avoid sampling bias or focus an analysis on specific spatial and/or temporal scales. [See the SARS-CoV-2 tutorial for more details on defining subsampling schemes](https://docs.nextstrain.org/en/latest/tutorials/SARS-CoV-2/steps/customizing-analysis.html#subsampling).
Expand Down

0 comments on commit fba1fac

Please sign in to comment.