Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jro mpxv global tree #160

Merged
merged 37 commits into from
Sep 21, 2023
Merged

Jro mpxv global tree #160

merged 37 commits into from
Sep 21, 2023

Conversation

jrotieno
Copy link
Contributor

@jrotieno jrotieno commented Aug 21, 2023

🛠️ Changes Being Made

Adds a new workflow, Samples_to_Ref_Tree_PHB:
https://github.com/theiagen/public_health_bioinformatics/blob/jro_mpxv_global_tree/workflows/phylogenetics/wf_nextclade_addToRefTree.wdl

and a new nextclade task:
https://github.com/theiagen/public_health_bioinformatics/blob/jro_mpxv_global_tree/tasks/taxon_id/task_nextclade.wdl

🧠 Context and Rationale

The workflow allows the user to perform phylogenetic placement of a set of samples onto a global reference tree; the global tree could be from nextclade data using the default tree.json , or a user supplied auspice tree JSON.

📋 Workflow/Task Steps

Take a sample or set of samples, and place them onto a reference tree using nextclade. A nextclade dataset reference tree is used by default, unless a reference tree is supplied by the user.

Inputs

Required inputs:

  1. assembly_fastas; A fasta file with query sequence(s) to be placed onto the global tree
  2. organism; This will be in line with the Nextclade dataset names, and therefore the options are " sars-cov-2", "flu_h1n1pdm_ha", "flu_h1n1pdm_na", "flu_h3n2_ha", "flu_h3n2_na", "flu_vic_ha", "flu_vic_na", "flu_yam_ha", "hMPXV", "hMPXV_B1", "MPXV", "rsv_a" and "rsv_b"

Optional inputs:
docker; nextclade docker image
dataset_reference; nextclade dataset reference sequence
dataset_tag; nextclade dataset tag
gene_annotations_gff; A genome annotations file for codon-aware alignment, gene translation and calling of aminoacid mutations
pcr_primers_csv; A file with a list of PCR primers used to detect changes in PCR primer regions
qc_config_json; A file with a set of parameters and thresholds used to configure the QC checks
reference_tree_json; A phylogenetic reference tree file which serves as a target for phylogenetic placement
root_sequence_fasta; A sequence which serves as a reference for alignment and the analysis
virus_properties; A configuration file that directs mutation labels

Outputs

treeUpdate_auspice_json; Output phylogenetic tree with user placed samples
treeUpdate_nextclade_docker; nextclade docker image used
treeUpdate_nextclade_json; JSON file with the results of the nextclade analysis
treeUpdate_nextclade_tsv; Tab-delimited file with nextclade results
treeUpdate_nextclade_version; nextclade version
samples_to_ref_tree_version ; PHB version
samples_to_ref_tree_analysis_date; Analysis date

🧪 Testing

Locally

Terra

The workflow has been tested on various nextclade pathogen datasets:
SARS-CoV-2; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/f9061e22-3246-459b-8617-9101f684e909
MPXV; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/d00d317f-0abe-4734-a769-9e447ee6beeb
RSV A; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/bcfcd389-2c91-4238-a083-2848d6baddd1
RSV B; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/4e22378b-3693-4eac-b693-806696220052
Influenza A H1N1pdm HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/e658c701-8676-4abe-b6ce-b2776086a5d2
Influenza A H1N1pdm NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/7786879b-ef9c-41cd-af75-c7cdf3110749
Influenza A H3N2 HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/7224d387-a107-4142-9e23-dd9891423d6c
Influenza A H3N2 NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/cb1c9e47-9c44-47cf-8a9b-5a58246aad05
Influenza B Victoria HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/2cfde037-1d6c-4f55-9e7b-ee083a39acfc
Influenza B Victoria NA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/0f4d6b13-37ef-41f2-a2aa-ea8c97aebe83
Influenza B Yamagata HA; https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/ae29b073-224b-4871-b30a-9b0b2db2d915

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

Copy link

@emily-smith1 emily-smith1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully on 10 individual SARS-CoV-2 genome assemblies. The outputs are as expected and the JSON files can be successfully loaded into Auspice.

@jrotieno jrotieno merged commit 1f567b3 into main Sep 21, 2023
19 checks passed
@jrotieno jrotieno deleted the jro_mpxv_global_tree branch September 26, 2023 13:17
@@ -19,7 +19,7 @@ task augur_refine {
String date_inference = "marginal" # assign internal nodes to their marginally most likley dates (joint, marginal)
String? branch_length_inference # branch length mode of treetime to use (auto, joint, marginal, input; default: auto)
String? coalescent # coalescent time scale in units of inverse clock rate (float), optimize as scalar ("opt") or skyline (skyline)
Int clock_filter_iqd = 4 # remove tips that deviate more than n_iqd interquartile ranges from the root-to-tip vs time regression
Int? clock_filter_iqd # remove tips that deviate more than n_iqd interquartile ranges from the root-to-tip vs time regression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrotieno curious to know the rationale for removing this default iqd? Seems to allow for the inclusion of very-poorly inferred dates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants