-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jro mpxv global tree #160
Jro mpxv global tree #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested successfully on 10 individual SARS-CoV-2 genome assemblies. The outputs are as expected and the JSON files can be successfully loaded into Auspice.
@@ -19,7 +19,7 @@ task augur_refine { | |||
String date_inference = "marginal" # assign internal nodes to their marginally most likley dates (joint, marginal) | |||
String? branch_length_inference # branch length mode of treetime to use (auto, joint, marginal, input; default: auto) | |||
String? coalescent # coalescent time scale in units of inverse clock rate (float), optimize as scalar ("opt") or skyline (skyline) | |||
Int clock_filter_iqd = 4 # remove tips that deviate more than n_iqd interquartile ranges from the root-to-tip vs time regression | |||
Int? clock_filter_iqd # remove tips that deviate more than n_iqd interquartile ranges from the root-to-tip vs time regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrotieno curious to know the rationale for removing this default iqd? Seems to allow for the inclusion of very-poorly inferred dates
🛠️ Changes Being Made
Adds a new workflow, Samples_to_Ref_Tree_PHB:
https://github.com/theiagen/public_health_bioinformatics/blob/jro_mpxv_global_tree/workflows/phylogenetics/wf_nextclade_addToRefTree.wdl
and a new nextclade task:
https://github.com/theiagen/public_health_bioinformatics/blob/jro_mpxv_global_tree/tasks/taxon_id/task_nextclade.wdl
🧠 Context and Rationale
The workflow allows the user to perform phylogenetic placement of a set of samples onto a global reference tree; the global tree could be from nextclade data using the default
tree.json
, or a user supplied auspice tree JSON.📋 Workflow/Task Steps
Take a sample or set of samples, and place them onto a reference tree using nextclade. A nextclade dataset reference tree is used by default, unless a reference tree is supplied by the user.
Inputs
Required inputs:
assembly_fastas
; A fasta file with query sequence(s) to be placed onto the global treeorganism
; This will be in line with the Nextclade dataset names, and therefore the options are " sars-cov-2", "flu_h1n1pdm_ha", "flu_h1n1pdm_na", "flu_h3n2_ha", "flu_h3n2_na", "flu_vic_ha", "flu_vic_na", "flu_yam_ha", "hMPXV", "hMPXV_B1", "MPXV", "rsv_a" and "rsv_b"Optional inputs:
docker
; nextclade docker imagedataset_reference
; nextclade dataset reference sequencedataset_tag
; nextclade dataset taggene_annotations_gff
; A genome annotations file for codon-aware alignment, gene translation and calling of aminoacid mutationspcr_primers_csv
; A file with a list of PCR primers used to detect changes in PCR primer regionsqc_config_json
; A file with a set of parameters and thresholds used to configure the QC checksreference_tree_json
; A phylogenetic reference tree file which serves as a target for phylogenetic placementroot_sequence_fasta
; A sequence which serves as a reference for alignment and the analysisvirus_properties
; A configuration file that directs mutation labelsOutputs
treeUpdate_auspice_json
; Output phylogenetic tree with user placed samplestreeUpdate_nextclade_docker
; nextclade docker image usedtreeUpdate_nextclade_json
; JSON file with the results of the nextclade analysistreeUpdate_nextclade_tsv
; Tab-delimited file with nextclade resultstreeUpdate_nextclade_version
; nextclade versionsamples_to_ref_tree_version
; PHB versionsamples_to_ref_tree_analysis_date
; Analysis date🧪 Testing
Locally
Terra
The workflow has been tested on various nextclade pathogen datasets:
SARS-CoV-2; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/f9061e22-3246-459b-8617-9101f684e909
MPXV; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/d00d317f-0abe-4734-a769-9e447ee6beeb
RSV A; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/bcfcd389-2c91-4238-a083-2848d6baddd1
RSV B; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/4e22378b-3693-4eac-b693-806696220052
Influenza A H1N1pdm HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/e658c701-8676-4abe-b6ce-b2776086a5d2
Influenza A H1N1pdm NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/7786879b-ef9c-41cd-af75-c7cdf3110749
Influenza A H3N2 HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/7224d387-a107-4142-9e23-dd9891423d6c
Influenza A H3N2 NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/cb1c9e47-9c44-47cf-8a9b-5a58246aad05
Influenza B Victoria HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/2cfde037-1d6c-4f55-9e7b-ee083a39acfc
Influenza B Victoria NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/0f4d6b13-37ef-41f2-a2aa-ea8c97aebe83
Influenza B Yamagata HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/ae29b073-224b-4871-b30a-9b0b2db2d915
🔬 Quality checks
Pull Request (PR) checklist: