Skip to content

Commit

Permalink
Merge pull request #111 from nextstrain/update-rsv
Browse files Browse the repository at this point in the history
rsv: update datasets with new consortium nomenclature
  • Loading branch information
rneher authored Nov 26, 2023
2 parents eadfae2 + f0a6f2d commit d7dfde3
Show file tree
Hide file tree
Showing 25 changed files with 10,264 additions and 374,840 deletions.
6 changes: 2 additions & 4 deletions data/nextstrain/rsv/a/EPI_ISL_412866/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
## Unreleased

Initial release for Nextclade v3!
**first release of v3 dataset.**

This dataset is converted from the corresponding older dataset for Nextclade v2. You can find old versions of datasets here: https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Updated consortium nomenclature.
26 changes: 16 additions & 10 deletions data/nextstrain/rsv/a/EPI_ISL_412866/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
# Nextclade dataset for "RSV-A" based on reference "hRSV/A/England/397/2017" (rsv_a/EPI_ISL_412866)
# RSV-A dataset with reference genome A/England/397/2017

| Key | Value |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------|
| authors | [Richard Neher](https://neherlab.org), Laura Urbanska, [Nextstrain](https://nextstrain.org) |
| data source | Genbank + authorized other sequences |
| workflow | [github.com/nextstrain/rsv/nextclade](https://github.com/nextstrain/rsv/nextclade) |
| nextclade dataset path | nextstrain/rsv/a/EPI_ISL_412866 |
| reference | EPI_ISL_412866 |
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) |

## Dataset attributes
## Scope of this dataset
This dataset for RSV-B uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737).
This sequence has the duplication in the G-protein shared by all currently circulating variants.
The reference tree covers the diversity of RSV-A since the first sequenced samples.

| attribute | value | value friendly |
| -------------------- | -------------------- | ---------------------------------------- |
| name | rsv_a | RSV-A |
| reference | EPI_ISL_412866 | hRSV/A/England/397/2017 |


## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
## Nomenclature
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion.
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A).
26 changes: 14 additions & 12 deletions data/nextstrain/rsv/a/EPI_ISL_412866/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
##gff-version 3
##sequence-region EPI_ISL_412866 1 15225
EPI_ISL_412866 feature source 1 15225 . . . mol_type=viral cRNA;organism=Human respiratory syncytial virus A
EPI_ISL_412866 feature gene 70 489 . . 0 codon_start=1;gene_name=NS1;product=nonstructural protein 1;protein_id=VVF34488.1
EPI_ISL_412866 feature gene 599 973 . . 0 codon_start=1;gene_name=NS2;product=nonstructural protein 2;protein_id=VVF34491.1
EPI_ISL_412866 feature gene 1111 2286 . . 0 codon_start=1;gene_name=N;product=nucleoprotein;protein_id=VVF34494.1
EPI_ISL_412866 feature gene 2318 3043 . . 0 codon_start=1;gene_name=P;product=phosphoprotein;protein_id=VVF34497.1
EPI_ISL_412866 feature gene 3226 3996 . . 0 codon_start=1;gene_name=M;product=matrix protein;protein_id=VVF34500.1
EPI_ISL_412866 feature gene 4266 4460 . . 0 codon_start=1;gene_name=SH;product=small hydrophobic protein;protein_id=VVF34503.1
EPI_ISL_412866 feature gene 4652 5617 . . 0 codon_start=1;gene_name=G;product=attachment glycoprotein;protein_id=VVF34506.1
EPI_ISL_412866 feature gene 5697 7421 . . 0 codon_start=1;gene_name=F;product=fusion glycoprotein;protein_id=VVF34509.1
EPI_ISL_412866 feature gene 7640 8224 . . 0 codon_start=1;gene_name=M2-1;product=M2-1 protein;protein_id=VVF34512.1
EPI_ISL_412866 feature gene 8193 8465 . . 0 codon_start=1;gene_name=M2-2;product=M2-2 protein;protein_id=VVF34515.1
EPI_ISL_412866 feature gene 8523 15029 . . 0 codon_start=1;gene_name=L;product=polymerase protein;protein_id=VVF34518.1
EPI_ISL_412866 annotation remark 1 15225 . . . molecule_type=cRNA;organism=Human orthopneumovirus;taxonomy=Viruses,Riboviria,Orthornavirae,Negarnaviricota,Haploviricotina,Monjiviricetes,Mononegavirales,Pneumoviridae,Orthopneumovirus,Orthopneumovirus hominis
EPI_ISL_412866 feature source 1 15225 . . . mol_type=viral cRNA;organism=Human orthopneumovirus
EPI_ISL_412866 feature 5'UTR 1 15 . . . citation=%5B1%5D;function=Leader region 5%27UTR
EPI_ISL_412866 feature CDS 70 489 . . 0 codon_start=1;db_xref=GeneID:37607636;gene=NS1;gene_name=NS1;Name=NS1;product=nonstructural protein 1;protein_id=YP_009518850.1;translation=MGSNSLSMIKVRLQNLFDNDEVALLKITCYTDKLIQLTNALAKAVIHTIKLNGIVFVHVITSSDICPNNNIVVKSNFTTMPVLQNGGYIWEMMELTHCSQPNGLIDDNCEIKFSKKLSDSTMTNYMNQLSELLGFDLNP%2A
EPI_ISL_412866 feature CDS 599 973 . . 0 codon_start=1;db_xref=GeneID:37607637;gene=NS2;gene_name=NS2;Name=NS2;product=nonstructural protein 2;protein_id=YP_009518851.1;translation=MDTTHNDTTPQRLMITDMRPLSLETIITSLTRDIITHKFIYLINHECIVRKLDERQATFTFLVNYEMKLLHKVGSTKYKKYTEYNTKYGTFPMPIFINHDGFLECIGIKPTKHTPIIYKYDLNP%2A
EPI_ISL_412866 feature CDS 1111 2286 . . 0 codon_start=1;db_xref=GeneID:37607638;gene=N;gene_name=N;Name=N;product=nucleoprotein;protein_id=YP_009518852.1;translation=MALSKVKLNDTLNKDQLLSSSKYTIQRSTGDSIDTPNYDVQKHINKLCGMLLITEDANHKFTGLIGMLYAMSRLGREDTIKILKDAGYHVKANGVDVTTHRQDINGKEMKFEVLTLASLTTEIQINIEIESRKSYKKMLKEMGEVAPEYRHDSPDCGMIILCIAALVITKLAAGDRSGLTAVIRRANNVLKNEMKRYKGLLPKDIANSFYEVFEKYPHFIDVFVHFGIAQSSTRGGSRVEGIFAGLFMNAYGAGQVMLRWGVLAKSVKNIMLGHASVQAEMEQVVEVYEYAQKLGGEAGFYHILNNPKASLLSLTQFPHFSSVVLGNAAGLGIMGEYRGTPRNQDLYDAAKVYAEQLKENGVINYSVLDLTAEELEAIKHQLNPKDNDVEL%2A
EPI_ISL_412866 feature CDS 2318 3043 . . 0 codon_start=1;db_xref=GeneID:37607639;gene=P;gene_name=P;Name=P;product=phosphoprotein;protein_id=YP_009518853.1;translation=MEKFAPEFHGEDANNRATKFLESIKGKFTSPKDPKKKDSIISVNSIDIEVTKESLITSNSTIINPINETDDTVGNKPNYQRKPLVSFKEDPTPSDNPFSKLYKETIETFDNNEEESSYSYEEINDQTNDNITARLDRIDEKLSEILGMLHTLVVASAGPTSARDGIRDAMVGLREEMIEKIRTEALMTNDRLEAMARLRNEESEKMAKDTSDEVSLNPTSEKLNNLLEGNDSDNDLSLEDF%2A
EPI_ISL_412866 feature CDS 3226 3996 . . 0 codon_start=1;db_xref=GeneID:37607640;gene=M;gene_name=M;Name=M;product=matrix protein;protein_id=YP_009518854.1;translation=METYVNKLHEGSTYTAAVQYNVLEKDDDPASLTIWVPMFQSSMPADLLIKELANVNILVKQISTPKGPSLRVMINSRSAVLAQMPSKFTICANVSLDERSKLAYDVTTPCEIKACSLTCLKSKNMLTTVKDLTMKTLNPTHDIIALCEFENIVTSKKVIIPTYLRSISVRNKDLNTLENITTTEFKNAITNAKIIPYSGLLLVITVTDNKGAFKYIKPQSQFIVDLGAYLEKESIYYVTTNWKHTATRFAIKPMED%2A
EPI_ISL_412866 feature CDS 4266 4460 . . 0 codon_start=1;db_xref=GeneID:37607641;gene=SH;gene_name=SH;Name=SH;product=small hydrophobic protein;protein_id=YP_009518855.1;translation=MENTSITIEFSSKFWPYFTLIHMITTIISLIIIISIMIAILNKLCEYNVFHNKTFELPRARVNT%2A
EPI_ISL_412866 feature CDS 4652 5617 . . 0 codon_start=1;db_xref=GeneID:37607642;gene=G;gene_name=G;Name=G;product=attachment glycoprotein;protein_id=YP_009518856.1;translation=MSKTKDQRTAKTLERTWDTLNHLLFISSCLYKLNLKSIAQITLSILAMIISTSLIIAAIIFIASANHKVTPTTAIIQDATNQIKNTTPTHLTQNPQLGISLSNLSGTTSQSTTILASTTPSAESTPQSTTVKIINTTTTQILPSKPTTKQRQNKPQNKPNNDFHFEVFNFVPCSICSNNPTCWAICKRIPNKKPGKKTTTKPTKKPTLKTTKKDPKPQTTKPKGVLTTKPTGKPTINTTKTNSRTTLLTSNTKGNPEHTSQKETIHSTTSEGYPSPSQVYTTSDQEETLHSTTSEGYPSPSQVYTTSEYLSQSLSSSNTTK%2A
EPI_ISL_412866 feature CDS 5697 7421 . . 0 codon_start=1;db_xref=GeneID:37607643;gene=F;gene_name=F;Name=F;product=fusion glycoprotein;protein_id=YP_009518857.1;translation=MELPILKTNAITTILAAVTLCFASSQNITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKENKCNGTDAKVKLIKQELDKYKNAVTELQLLMQSTPAANSRARRELPRFMNYTLNNTKNTNVTLSKKRKRRFLGFLLGVGSAIASGIAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSISNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSSNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNIDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDEFDASISQVNEKINQSLAFIRKSDELLHNVNAGKSTTNIMITTIIIVIIVILLALIAVGLLLYCKARSTPVTLSKDQLSGINNIAFSN%2A
EPI_ISL_412866 feature CDS 7640 8224 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-1;Name=M2-1;note=ORF 1%2C matrix protein 2;product=M2-1 protein;protein_id=YP_009518858.1;translation=MSRRNPCKFEIRGHCLNGKRCHFSHNYFEWPPHALLVRQNFMLNRILKSMDKSIDTLSEISGAAELDRTEEYALGVVGVLESYIGSINNITKQSACVAMSKLLTELNSDDIKKLRDNEEPNSPKVRVYNTVISYIESNRKNNKQTIHLLKRLPADVLKKTIKNTLDIHKSITINNSKESTVSDTNDHAKNNDTT%2A
EPI_ISL_412866 feature CDS 8193 8465 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-2;Name=M2-2;note=ORF 2%2C RNA processivity factor;product=M2-2 protein;protein_id=YP_009518859.1;translation=TTMPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS%2A
EPI_ISL_412866 feature CDS 8532 15029 . . 0 codon_start=1;db_xref=GeneID:37607645;gene=L;gene_name=L;Name=L;note=RNA dependant RNA polymerase%3B RdRp;product=polymerase protein;protein_id=YP_009518860.1;translation=MDPIISGNSANVYLTDSYLKGVISFSECNALGSYIFNGPYLKNDYTNLISRQNPLIEHINLKKLNITQSLISKYHKGEIKIEEPTYFQSLLMTYKSMTSSEQTTTTNLLKKIIRRAIEISDVKVYAILNKLGLKEKDKIKSNNGQDEDNSVITTIIKDDILLAVKDNQSHPKADKNQSTKQKDTIKTTLLKKLMCSMQHPPSWLIHWFNLYTKLNSILTQYRSSEVKNHGFILIDNHTLSGFQFILNQYGCIVYHRELKRITVTTYNQFLTWKDISLSRLNVCLITWISNCLNTLNKSLGLRCGFNNVILTQLFLYGDCILKLFHNEGFYIIKEVEGFIMSLILNITEEDQFRKRFYNSMLNNITDAANKAQKNLLSRVCHTLLDKTISDNIINGRWIILLSKFLKLIKLAGDNNLNNLSELYFLFRIFGHPMVDERQAMDAVKVNCNETKFYLLSSLSMLRGAFIYRIIKGFVNNYNRWPTLRNAIVLPLRWLTYYKLNTYPSLLELTERDLIVLSGLRFYREFRLPKKVDLEMIINDKAISPPKNLIWTSFPRNYMPSHIQNYIEHEKLKFSDSDKSRRVLEYYLRDNKFNECDLYNCVVNQSYLNNPNHVVSLTGKERELSVGRMFAMQPGMFRQVQILAEKMIAENILQFFPESLTRYGDLELQKILELKAGISNKSNRYNDNYNNYISKCSIITDLSKFNQAFRYETSCICSDVLDELHGVQSLFSWLHLTIPHVTIICTYRHAPPYIKDHIVDLNNVDEQSGLYRYHMGGIEGWCQKLWTIEAISLLDLISLKGKFSITALINGDNQSIDISKPVRLMEGQTHAQADYLLALNSLKLLYKEYAGIGHKLKGTETYISRDMQFMSKTIQHNGVYYPASIKKVLRVGPWINTILDDFKVSLESIGSLTQELEYRGESLLCSLIFRNVWLYNQIALQLKNHALCNNKLYLDILKVLKHLKTFFNLDNIDTALTLYMNLPMLFGGGDPNLLYRSFYRRTPDFLTEAIVHSVFILSYYTNHDLKDKLQDLSDDRLNKFLTCIITFDKNPNAEFVTLMRDPQALGSERQAKITSEINRLAVTEVLSTAPNKIFSKSAQHYTTTEIDLNDIMQNIEPTYPHGLRVVYESLPFYKAEKIVNLISGTKSITNILEKTSAIDLTDIDRATEMMRKNITLLIRILPLDCNRDKREILSMENLSITELSKYVRERSWSLSNIVGVTSPSIMYTMDIKYTTSTIASGIIIEKYNVNSLTRGERGPTKPWVGSSTQEKKTMPVYNRQVLTKKQRDQIDLLAKLDWVYASIDNKDEFMEELSIGTLGLTYEKAKKLFPQYLSVNYLHRLTVSSRPCEFPASIPAYRTTNYHFDTSPINRILTEKYGDEDIDIVFQNCISFGLSLMSVVEQFTNVCPNRIILIPKLNEIHLMKPPIFTGDVDIHKLKLVIQKQHMFLPDKISLTQYVELFLSNKTLKSGSNVNSNLILAHKISDYFHNTYILSTNLAGHWILIIQLMKDSKGIFEKDWGEGYITDHMFINLKVFFNAYKTYLLCFHKGYGRAKLECDMNTSDLLCVLELIDSSYWKSMSKVFLEQKVIKYILSQDASLHRVKGCHSFKLWFLKRLNVAEFTVCPWVVNIDYHPTHMKAILTYIDLVRMGLINIDRIYIKNKHKFNDEFYTSNLFYINYNFSDNTHLLTKHIRIANSELESNYNKLYHPTPETLENILTNPVKNNEKKTLSGYCIGKNVDSIMLPSLSNKKLIKSSTMIRTNYSRQDLYNLFPTVVIDKIIDHSGNTAKSNQLYTTTSHQISLVHNSTSLYCMLPWHHINRFNFVFSSTGCKISIEYILKDLKIKDPNCIAFIGEGAGNLLLRTVVELHPDIRYIYRSLKDCNDHSLPIEFLRLYNGHINIDYGENLTIPATDATNNIHWSYLHIKFAEPISLFVCDAELPVTVNWSKIIIEWSKHVRKCKYCSSVNKCTLIVKYHAQDDIDFKLDNITILKTYVCLGSKLKGSEVYLVLTIGPANVFPVFNVVQNAKLILSRTKNFIMPKKADKESIDANIKSLIPFLCYPITKKGINTALSKLKSVVSGDILSYSIAGRNEVFSNKLINHKHMNILKWFNHVLNFRSTELNYNHLYMVESTYPHLSELLNSLTTNELKKLIKITGSLLYNFYNE%2A
77 changes: 36 additions & 41 deletions data/nextstrain/rsv/a/EPI_ISL_412866/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"excessBandwidth": 30,
"excessBandwidth": 9,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"windowSize": 15,
"minMatchLength": 30,
"minSeedCover": 0.05,
"gapAlignmentSide": "left",
"kmerDistance": 10
"minSeedCover": 0.1
},
"compatibility": {
"cli": "3.0.0-alpha.0",
Expand All @@ -23,73 +21,70 @@
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"geneOrderPreference": [
"F",
"G",
"L"
],
"qc": {
"frameShifts": {
"enabled": true
"privateMutations": {
"enabled": true,
"typical": 50,
"cutoff": 150,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"missingData": {
"enabled": false,
"missingDataThreshold": 2000,
"scoreBias": 500
},
"snpClusters": {
"enabled": false,
"windowSize": 100,
"clusterCutOff": 10,
"scoreWeight": 50
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 8
},
"privateMutations": {
"cutoff": 150,
"enabled": true,
"typical": 50,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"clusterCutOff": 10,
"enabled": false,
"scoreWeight": 50,
"windowSize": 100
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true
"enabled": true,
"ignoredStopCodons": []
}
},
"schemaVersion": "3.0.0",
"version": {
"tag": "unreleased"
},
"attributes": {
"name": "Respiratory Syncytial Virus A",
"reference name": "hRSV/A/England/397/2017",
"reference accession": "EPI_ISL_412866"
},
"geneOrderPreference": [
"F",
"G",
"L"
],
"maintenance": {
"website": [
"https://nextstrain.org",
"https://clades.nextstrain.org"
],
"documentation": [
"https://github.com/nextstrain/nextclade_data",
"https://docs.nextstrain.org/projects/nextclade"
"https://github.com/nextstrain/rsv"
],
"source code": [
"https://github.com/nextstrain/nextclade_data",
"https://github.com/neherlab/nextclade_data_workflows"
"https://github.com/nextstrain/rsv"
],
"issues": [
"https://github.com/nextstrain/nextclade_data",
"https://github.com/nextstrain/nextclade_data/issues"
"https://github.com/nextstrain/rsv/issues"
],
"organizations": [
"Nextstrain"
],
"authors": [
"Nextstrain team <https://nextstrain.org>"
]
},
"attributes": {
"name": "RSV-A",
"reference accession": "EPI_ISL_412866",
"reference name": "hRSV/A/England/397/2017"
},
"version": {
"tag": "unreleased"
}
}
Loading

0 comments on commit d7dfde3

Please sign in to comment.