-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #20 from nextstrain/add-N450-tree
Make tree for 450bp of the N gene ("N450")
- Loading branch information
Showing
13 changed files
with
222 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
{ | ||
"title": "Real-time tracking of measles virus evolution", | ||
"maintainers": [ | ||
{"name": "Kim Andrews", "url": "https://bedford.io/team/kim-andrews/"}, | ||
{"name": "the Nextstrain team", "url": "https://nextstrain.org/team"} | ||
], | ||
"build_url": "https://github.com/nextstrain/measles", | ||
"colorings": [ | ||
{ | ||
"key": "gt", | ||
"title": "Genotype", | ||
"type": "categorical" | ||
}, | ||
{ | ||
"key": "num_date", | ||
"title": "Date", | ||
"type": "continuous" | ||
}, | ||
{ | ||
"key": "country", | ||
"title": "Country", | ||
"type": "categorical" | ||
}, | ||
{ | ||
"key": "region", | ||
"title": "Region", | ||
"type": "categorical" | ||
} | ||
], | ||
"geo_resolutions": [ | ||
"country", | ||
"region" | ||
], | ||
"display_defaults": { | ||
"map_triplicate": true | ||
}, | ||
"filters": [ | ||
"country", | ||
"region", | ||
"author" | ||
], | ||
"metadata_columns": [ | ||
"author" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
>lcl|NC_001498.1_cds_NP_056918.1_1 [gene=N] [locus_tag=MeVgp1] [db_xref=GeneID:1489804] [protein=nucleocapsid protein] [protein_id=NP_056918.1] [location=1233..1682] [gbkey=CDS] | ||
GTCAGTTCCACATTGGCATCCGAACTCGGTATCACTGCCGAGGATGCAAGGCTTGTTTCAGAGAT | ||
TGCAATGCATACTACTGAGGACAGGATCAGTAGAGCGGTCGGACCCAGACAAGCCCAAGTGTCATTTCTA | ||
CACGGTGATCAAAGTGAGAATGAGCTACCAGGATTGGGGGGCAAGGAAGATAGGAGGGTCAAACAGGGTC | ||
GGGGAGAAGCCAGGGAGAGCTACAGAGAAACCGGGTCCAGCAGAGCAAGTGATGCGAGAGCTGCCCATCC | ||
TCCAACCAGCATGCCCCTAGACATTGACACTGCATCGGAGTCAGGCCAAGATCCGCAGGACAGTCGAAGG | ||
TCAGCTGACGCCCTGCTCAGGCTGCAAGCCATGGCAGGAATCTTGGAAGAACAAGGCTCAGACACGGACA | ||
CCCCTAGGGTATACAATGACAGAGATCTTCTAGAC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
LOCUS NC_001498 450 bp cRNA linear VRL 13-AUG-2018 | ||
DEFINITION Measles virus, complete genome. | ||
ACCESSION NC_001498 REGION: 1233..1682 | ||
VERSION NC_001498.1 | ||
DBLINK Project: 15025 | ||
BioProject: PRJNA485481 | ||
KEYWORDS RefSeq. | ||
SOURCE Measles morbillivirus | ||
ORGANISM Measles morbillivirus | ||
Viruses; Riboviria; Orthornavirae; Negarnaviricota; | ||
Haploviricotina; Monjiviricetes; Mononegavirales; Paramyxoviridae; | ||
Orthoparamyxovirinae; Morbillivirus; Morbillivirus hominis. | ||
REFERENCE 1 (sites) | ||
AUTHORS Rima,B.K. and Duprex,W.P. | ||
TITLE The measles virus replication cycle | ||
JOURNAL Curr. Top. Microbiol. Immunol. 329, 77-102 (2009) | ||
PUBMED 19198563 | ||
REFERENCE 2 | ||
AUTHORS Takeuchi,K., Miyajima,N., Kobune,F. and Tashiro,M. | ||
TITLE Comparative nucleotide sequence analyses of the entire genomes of | ||
B95a cell-isolated and vero cell-isolated measles viruses from the | ||
same patient | ||
JOURNAL Virus Genes 20 (3), 253-257 (2000) | ||
PUBMED 10949953 | ||
REFERENCE 3 (bases 1 to 450) | ||
CONSRTM NCBI Genome Project | ||
TITLE Direct Submission | ||
JOURNAL Submitted (01-AUG-2000) National Center for Biotechnology | ||
Information, NIH, Bethesda, MD 20894, USA | ||
REFERENCE 4 (bases 1 to 450) | ||
AUTHORS Takeuchi,K., Tanabayashi,K. and Tashiro,M. | ||
TITLE Direct Submission | ||
JOURNAL Submitted (10-JUL-1998) Kaoru Takeuchi, National Institute of | ||
Infectious Diseases, Viral Disease and Vaccine Contorol; 4-7-1 | ||
Gakuen, Musashi-murayama, Tokyo 208-0011, Japan | ||
(E-mail:[email protected], Tel:81-42-561-0771(ex.530), | ||
Fax:81-42-567-5631) | ||
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The | ||
reference sequence was derived from AB016162. | ||
Sequence updated (21-Jul-1998) | ||
Sequence updated (11-Dec-1998). | ||
COMPLETENESS: full length. | ||
FEATURES Location/Qualifiers | ||
source 1..450 | ||
/organism="Measles morbillivirus" | ||
/mol_type="viral cRNA" | ||
/strain="Ichinose-B95a" | ||
/db_xref="taxon:11234" | ||
CDS <1..>450 | ||
/gene="N" | ||
/codon_start=1 | ||
/product="nucleocapsid protein" | ||
/protein_id="NP_056918.1" | ||
/db_xref="GeneID:1489804" | ||
/translation="VSSTLASELGITAEDAR | ||
LVSEIAMHTTEDRISRAVGPRQAQVSFLHGDQSENELPGLGGKEDRRVKQGRGEARES | ||
YRETGSSRASDARAAHPPTSMPLDIDTASESGQDPQDSRRSADALLRLQAMAGILEEQ | ||
GSDTDTPRVYNDRDLLD" | ||
ORIGIN | ||
1 gtcagttcca cattggcatc cgaactcggt atcactgccg aggatgcaag gcttgtttca | ||
61 gagattgcaa tgcatactac tgaggacagg atcagtagag cggtcggacc cagacaagcc | ||
121 caagtgtcat ttctacacgg tgatcaaagt gagaatgagc taccaggatt ggggggcaag | ||
181 gaagatagga gggtcaaaca gggtcgggga gaagccaggg agagctacag agaaaccggg | ||
241 tccagcagag caagtgatgc gagagctgcc catcctccaa ccagcatgcc cctagacatt | ||
301 gacactgcat cggagtcagg ccaagatccg caggacagtc gaaggtcagc tgacgccctg | ||
361 ctcaggctgc aagccatggc aggaatcttg gaagaacaag gctcagacac ggacacccct | ||
421 agggtataca atgacagaga tcttctagac | ||
// | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
""" | ||
This part of the workflow prepares sequences for constructing the phylogenetic tree for 450bp of the N gene. | ||
See Augur's usage docs for these commands for more details. | ||
""" | ||
|
||
rule align_and_extract_N450: | ||
input: | ||
sequences = "data/sequences.fasta", | ||
reference = config["files"]["reference_N450_fasta"] | ||
output: | ||
sequences = "results/N450/sequences.fasta" | ||
params: | ||
min_length = config['filter_N450']['min_length'] | ||
shell: | ||
""" | ||
nextclade run \ | ||
-j 1 \ | ||
--input-ref {input.reference} \ | ||
--output-fasta {output.sequences} \ | ||
--min-seed-cover 0.01 \ | ||
--min-length {params.min_length} \ | ||
--silent \ | ||
{input.sequences} | ||
""" | ||
rule filter_N450: | ||
""" | ||
Filtering to | ||
- {params.sequences_per_group} sequence(s) per {params.group_by!s} | ||
- excluding strains in {input.exclude} | ||
- minimum genome length of {params.min_length} | ||
- excluding strains with missing region, country or date metadata | ||
""" | ||
input: | ||
sequences = "results/N450/sequences.fasta", | ||
metadata = "data/metadata.tsv", | ||
exclude = config["files"]["exclude"] | ||
output: | ||
sequences = "results/N450/aligned.fasta" | ||
params: | ||
group_by = config['filter_N450']['group_by'], | ||
subsample_max_sequences = config["filter_N450"]["subsample_max_sequences"], | ||
min_date = config["filter_N450"]["min_date"], | ||
min_length = config['filter_N450']['min_length'], | ||
strain_id = config["strain_id_field"] | ||
shell: | ||
""" | ||
augur filter \ | ||
--sequences {input.sequences} \ | ||
--metadata {input.metadata} \ | ||
--metadata-id-columns {params.strain_id} \ | ||
--exclude {input.exclude} \ | ||
--output {output.sequences} \ | ||
--group-by {params.group_by} \ | ||
--subsample-max-sequences {params.subsample_max_sequences} \ | ||
--min-date {params.min_date} \ | ||
--min-length {params.min_length} | ||
""" |