Skip to content

Commit

Permalink
Merge pull request #105 from nextstrain/flu-update
Browse files Browse the repository at this point in the history
flu: dataset update for all lineages
  • Loading branch information
rneher authored Nov 18, 2023
2 parents bd6d9d1 + 33c43ab commit 5e2742c
Show file tree
Hide file tree
Showing 91 changed files with 14,302 additions and 634,709 deletions.
2 changes: 1 addition & 1 deletion data/nextstrain/flu/h1n1pdm/ha/CY121680/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

Initial release for Nextclade v3!

This dataset is converted from the corresponding older dataset for Nextclade v2. You can find old versions of datasets here: https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets
- addition of subclade [C.1.7](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.7.yml)

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
29 changes: 26 additions & 3 deletions data/nextstrain/flu/h1n1pdm/ha/CY121680/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,35 @@
# Nextclade dataset for "Influenza A H1N1pdm HA" based on reference "A/California/07/2009" (flu_h1n1pdm_ha/CY121680)
# Nextclade dataset for "Influenza A H1N1pdm HA" based on reference "A/California/07/2009" (flu/h1n1pdm/ha/CY121680)

This dataset uses an older reference sequence (A/California/07/2009) and recent sequences will differ at a large number of positions from this reference.
For the analysis of currently circulating viruses, the dataset using A/Wisconsin/588/2019 as reference might be more appropriate.

## Dataset attributes

| attribute | value | value friendly |
| -------------------- | -------------------- | ---------------------------------------- |
| name | flu_h1n1pdm_ha | Influenza A H1N1pdm HA |
| reference | CY121680 | A/California/07/2009 |
| name | flu/h1n1pdm/ha | Influenza A H1N1pdm HA |
| reference | CY121680 | A/California/07/2009 |


## Features
This dataset supports

* Assignment to clades and subclades based on the nomenclature defined in [github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/)
* Identification of glycosilation motifs
* Sequence QC
* Phylogenetic placement

## Clades of seasonal influenza viruses

The WHO Collaborating centers define "clades" as genetic groups of viruses with signature mutations to facilitate discussion of circulating diversity of the viruses.
Clade demarcation do not always coincide with significantly different antigenic properties of the viruses.
Clade names are structured as _Number-Letter_ binomials separated by periods as in `6B.1A.5a.2a.1`. These sometimes get shortened by omission of leading binomials like `5a.2a.1`.

In addition to these clades, "subclades" are defined to break down diversity at higher resolution and allow following the spread of different viral groups.
These follow a Pango-like nomenclature consisting of a letter followed by a numbers separated by periods as in `C.1.2`.
The leading letter is an alias of a previous name.
Details of the nomenclature system can be found at [github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/).



## What is Nextclade dataset
Expand Down
129 changes: 69 additions & 60 deletions data/nextstrain/flu/h1n1pdm/ha/CY121680/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,29 +1,12 @@
{
"aaMotifs": [
{
"description": "N-linked glycosylation motifs (N-X-S/T with X any amino acid other than P)",
"includeGenes": [
{
"gene": "HA1"
},
{
"gene": "HA2",
"ranges": [
{
"begin": 0,
"end": 186
}
]
}
],
"motifs": [
"N[^P][ST]"
],
"name": "glycosylation",
"nameFriendly": "Glycosylation",
"nameShort": "Glyc."
}
],
"schemaVersion": "3.0.0",
"alignmentParams": {
"excessBandwidth": 9,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
"minSeedCover": 0.1
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand All @@ -38,72 +21,98 @@
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"geneOrderPreference": [
"HA1",
"HA2"
],
"qc": {
"frameShifts": {
"enabled": true
"privateMutations": {
"enabled": true,
"typical": 5,
"cutoff": 15,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"missingData": {
"enabled": false,
"missingDataThreshold": 100,
"scoreBias": 10
},
"snpClusters": {
"enabled": false,
"windowSize": 100,
"clusterCutOff": 5,
"scoreWeight": 50
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 4
},
"privateMutations": {
"cutoff": 15,
"enabled": true,
"typical": 5,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"clusterCutOff": 5,
"enabled": false,
"scoreWeight": 50,
"windowSize": 100
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true
"enabled": true,
"ignoredStopCodons": []
}
},
"schemaVersion": "3.0.0",
"version": {
"tag": "unreleased"
},
"attributes": {
"name": "Influenza A H1N1pdm HA",
"reference name": "A/California/07/2009",
"reference accession": "CY121680"
},
"geneOrderPreference": [
"HA1",
"HA2"
],
"maintenance": {
"website": [
"https://nextstrain.org",
"https://clades.nextstrain.org"
],
"documentation": [
"https://github.com/nextstrain/nextclade_data",
"https://docs.nextstrain.org/projects/nextclade"
"https://github.com/nextstrain/seasonal-flu"
],
"source code": [
"https://github.com/nextstrain/nextclade_data",
"https://github.com/neherlab/nextclade_data_workflows"
"https://github.com/nextstrain/seasonal_flu"
],
"issues": [
"https://github.com/nextstrain/nextclade_data",
"https://github.com/nextstrain/nextclade_data/issues"
"https://github.com/nextstrain/seasonal_flu/issues"
],
"organizations": [
"Nextstrain"
],
"authors": [
"Nextstrain team <https://nextstrain.org>"
]
},
"nucMutLabelMap": {},
"nucMutLabelMapReverse": {},
"aaMotifs": [
{
"name": "glycosylation",
"nameShort": "Glyc.",
"nameFriendly": "Glycosylation",
"description": "N-linked glycosylation motifs (N-X-S/T with X any amino acid other than P)",
"includeGenes": [
{
"gene": "HA1",
"ranges": []
},
{
"gene": "HA2",
"ranges": [
{
"begin": 0,
"end": 186
}
]
}
],
"motifs": [
"N[^P][ST]"
]
}
],
"attributes": {
"name": "Influenza A H1N1pdm HA",
"segment": "ha",
"reference accession": "CY121680",
"reference name": "A/California/7/2009-egg"
},
"version": {
"tag": "unreleased"
}
}
Loading

0 comments on commit 5e2742c

Please sign in to comment.