Skip to content

Commit

Permalink
Add BA.2.86 dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
corneliusroemer committed Nov 24, 2023
1 parent 9f370d4 commit bd13876
Show file tree
Hide file tree
Showing 8 changed files with 6,354 additions and 0 deletions.
1 change: 1 addition & 0 deletions data/nextstrain/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"nextstrain/sars-cov-2/MN908947",
"nextstrain/sars-cov-2/BA.2",
"nextstrain/sars-cov-2/XBB",
"nextstrain/sars-cov-2/BA.2.86",
"nextstrain/flu/h1n1pdm/ha/CY121680",
"nextstrain/flu/h1n1pdm/ha/MW626062",
"nextstrain/flu/h1n1pdm/na/MW626056",
Expand Down
7 changes: 7 additions & 0 deletions data/nextstrain/sars-cov-2/BA.2.86/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Unreleased

Initial release for Nextclade v3!

This dataset is converted from the corresponding older dataset for Nextclade v2. You can find old versions of datasets here: https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
50 changes: 50 additions & 0 deletions data/nextstrain/sars-cov-2/BA.2.86/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Nextclade dataset for "SARS-CoV-2 relative to BA.2" based on reference "BA.2" (sars-cov-2-21L/BA.2)


## Dataset attributes

| attribute | value | value friendly |
| -------------------- | -------------------- | ---------------------------------------- |
| name | sars-cov-2-21L | SARS-CoV-2 relative to BA.2 |
| reference | BA.2 | BA.2 |


## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html


### What are the SARS-CoV-2 clades?

Nextclade was originally developed during COVID-19 pandemic, primarily focused on SARS-CoV-2. This section describes clades with application to SARS-CoV-2, but Nextclade can analyse other pathogens too.

<figure>
<a href="https://raw.githubusercontent.com/nextstrain/ncov-clades-schema/master/clades.svg">
<picture>
<img
src="https://raw.githubusercontent.com/nextstrain/ncov-clades-schema/master/clades.svg"
alt="Illustration of phylogenetic relationships of SARS-CoV-2 clades, as defined by Nextstrain"
/>
</picture>
</a>
<figcaption>
<small>
Fig.1. Illustration of phylogenetic relationships of SARS-CoV-2 clades, as defined by Nextstrain (<a href="https://github.com/nextstrain/ncov-clades-schema/">source</a>)
</small>
</figcaption>
</figure>

Since its emergence in late 2019, SARS-CoV-2 has diversified into several different co-circulating variants. To facilitate discussion of these variants, we have grouped them into __clades__ which are defined by specific signature mutations.

We currently define more than 30 clades (see [this blog post](https://nextstrain.org/blog/2021-01-06-updated-SARS-CoV-2-clade-naming) for details):

- 19A and 19B emerged in Wuhan and have dominated the early outbreak
- 20A emerged from 19A out of dominated the European outbreak in March and has since spread globally
- 20B and 20C are large genetically distinct subclades 20A emerged in early 2020
- 20D to 20J have emerged over the summer of 2020 and include three "Variants of Concern" (VoC).
- 21A to 21F include the VoC __delta__ and several Variants of Interest (VoI).
- 21K onwards are different clades within the diverse VoC __omicron__.

Within Nextstrain, we define each clade by its combination of signature mutations. You can find the exact clade definition in [github.com/nextstrain/ncov/defaults/clades.tsv](https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv). When available, we will include [WHO labels for VoCs and VoIs](https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/).

Learn more about how Nextclade assigns clades in the [documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/).
18 changes: 18 additions & 0 deletions data/nextstrain/sars-cov-2/BA.2.86/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
##gff-version 3
##sequence-region MN908947 1 29903
# Gene map (genome annotation) of SARS-CoV-2 in GFF format.
# For gene map purpses we only need some of the columns. We substitute unused values with "." as per GFF spec.
# See GFF format reference at https://www.ensembl.org/info/website/upload/gff.html
# seqname source feature start end score strand frame attribute
MN908947 GenBank gene 266 13468 . + . gene_name=ORF1a
MN908947 GenBank gene 13468 21555 . + . gene_name=ORF1b
MN908947 GenBank gene 25393 26220 . + . gene_name=ORF3a
MN908947 GenBank gene 21563 25384 . + . gene_name=S
MN908947 GenBank gene 26245 26472 . + . gene_name=E
MN908947 GenBank gene 26523 27191 . + . gene_name=M
MN908947 GenBank gene 27202 27387 . + . gene_name=ORF6
MN908947 GenBank gene 27394 27759 . + . gene_name=ORF7a
MN908947 GenBank gene 27756 27887 . + . gene_name=ORF7b
MN908947 GenBank gene 27894 28259 . + . gene_name=ORF8
MN908947 GenBank gene 28274 29533 . + . gene_name=N
MN908947 GenBank gene 28284 28577 . + . gene_name=ORF9b
Loading

0 comments on commit bd13876

Please sign in to comment.