Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB #272

Merged
merged 59 commits into from
Apr 15, 2024

Conversation

sage-wright
Copy link
Member

@sage-wright sage-wright commented Dec 15, 2023

This PR closes #276 by introducing the TBProfiler_tNGS_PHB workflow, designed for Illumina PE tNGS data.

🗑️ This dev branch should NOT be deleted after merging to main.

🧠 Aim, Context and Functionality

tNGS is being used to analyze Mycobacterium tuberculosis data for clinical usage. Targeted sequence requires different analysis approaches to WGS, which means that TheiaProk workflows cannot be used as they are intended to create an assembled genome. Since this data is fragmented and amplicon-based, creating an assembly is a bad idea.

TBProfiler_tNGS_PHB is our solution: a workflow that performs minimal QC and runs TBProfiler and tbp-parser by default.

The minimal QC performed is as follows:

  • trimmomatic is run using a workflow parameter bases_to_crop (default=30) which cuts 30bp from the start and all bases that fall after a (average_read_length - 30bp) limit in the hope to remove primers and other sequencing artifacts.
    • we cannot trim primers because the method to generate this data uses proprietary primers

clockwork is currently not implemented due to difficult to resolve issues experienced during implementation of the tool.

🛠️ Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

📋 Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed:
tbp-parser has been updated to v1.3.0 which includes tNGS compatibility via the inclusion of the tNGS primer region bed file.

Databases or database versions changed:
No database changes.

Data processing/commands changed:
A new input parameter trimmomatic_base_crop is added to the trimmomatic_pe task. This Integer variable, if provided, will trigger calculation of average read length and creation of new parameters for the trimmomatic task, specifically: HEADCROP and CROP.

  • HEADCROP:<int> indicates the number of bases to remove from the START of the read
  • CROP:<int> indicates the FINAL LENGTH of the read that will be kept from the start of the read; any bases after this length will be removed.

Average read length is used to determine the CROP value dynamically; the trimmomatic_base_crop value will be removed from the average read length. HEADCROP is set to equal trimmomatic_base_crop.

No other analysis changes have been made to TBProfiler and tbp-parser (other than updated tbp-parser version, description available in tbp-parser repository).

File processing changed:
No file processing changes.

Compute resources changed:
No compute resources changes.

➡️ Inputs

All inputs are new because this is a new workflow.

New required inputs:

  • File read1
  • File read2
  • String samplename

New optional inputs for tbp_parser task:

  • Int coverage threshold
  • Int cpu
  • Int disk_size
  • String docker
  • Int memory
  • Int min_depth
  • String operator
  • String sequencing_method
  • Boolean tbp_parser_debug

New optional inputs for tbprofiler task:

  • Int cov_frac_threshold
  • Int cpu
  • Int disk_size
  • String mapper
  • Float min_af
  • Float min_af_pred
  • Int min_depth
  • Boolean ont_data
  • File tbprofiler_custom_db
  • String tbprofiler_docker_image
  • Boolean tbprofiler_run_custom_db
  • String variant_caller
  • String variant_calling_params

New optional inputs for tbprofiler_tngs workflow:

  • Int bases_to_crop

New optional inputs for trimmomatic_pe task:

  • Int disk_size
  • String docker
  • Int threads
  • String trimmomatic_args
  • Int trimmomatic_minlen
  • Int trimmomatic_quality_trim_score
  • Int trimmomatic_window_size

New optional inputs for version_capture task:

  • String docker
  • String timezone

⬅️ Outputs

All outputs are new because this is a new workflow.

New outputs (in alphabetical order):

  • tbp_parser_average_genome_depth
  • tbp_parser_coverage_report
  • tbp_parser_docker
  • tbp_parser_genome_percent_coverage
  • tbp_parser_laboratorian_report_csv
  • tbp_parser_lims_report_csv
  • tbp_parser_looker_report_csv
  • tbp_parser_version
  • tbprofiler_dr_type
  • tbprofiler_main_lineage
  • tbprofiler_median_coverage
  • tbprofiler_num_dr_variants
  • tbprofiler_num_other_variants
  • tbprofiler_output_alignment_bai
  • tbprofiler_output_alignment_bam
  • tbprofiler_pct_reads_mapped
  • tbprofiler_report_csv
  • tbprofiler_report_json
  • tbprofiler_report_tsv
  • tbprofiler_resistance_genes
  • tbprofiler_sub_lineage
  • tbprofiler_tngs_wf_analysis_date
  • tbprofiler_tngs_wf_version
  • tbprofiler_version
  • trimmomatic_docker
  • trimmomatic_read1_trimmed
  • trimmomatic_read2_trimmed
  • trimmomatic_stats
  • trimmomatic_version

🧪 Testing

Test Dataset

  • This workflow only works on Mycobacterium tuberculosis

Command-line Testing with MiniWDL or Cromwell (optional)

Terra Testing

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

🔬 Final Developer Checklist

  • The workflow/task has been tested locally and results, including file contents, are as anticipated: Yes/No
  • The workflow/task has been tested on Terra and results, including file contents, are as anticipated: Yes/No
  • The CI/CD has been adjusted and tests are passing: Yes/No
  • Code changes follow the style guide: Yes/No

🎯 Reviewer Checklist

  • All impacted workflows/tasks have been tested on Terra with a different dataset than used for development
  • All reviewer-suggested scenarios have been tested and any additional
  • All changed results have been confirmed to be accurate
  • All workflows/tasks impacted by change/s have been tested using a standard validation dataset to ensure no unintended change of functionality
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments

🗂️ Associated Documentation (to be completed by Theiagen developer)

  • Relevant documentation on the Public Health Resources "PHB Main" has been updated
  • Workflow diagrams have been updated to reflect changes

sage-wright and others added 25 commits October 17, 2023 18:38
* updated VCF output file renaming in kSNP3 task (#207)

* updated VCF output file renaming in kSNP3 task; also added 1 new File output and change the output names to be more descriptive

* ksnp3 task:changed VCF file names to be predictable; split 2 ksnp3 options to 2 lines for readability; added new string output "ksnp3_vcf_ref_samplename" to capture sample within cluster to use for snp calling

* added new string output to ksnp3 workflow "ksnp3_vcf_ref_samplename"

* reduce unnecessary logging in MIDAS task (#210)

* made untar/decompression of midas database quiet since it produces 41k lines of output. also made the 2 mv commands verbose (but it's only 2 lines!)

* update CI

* expose tbprofiler parameters as inputs in merlin

* input spelling

---------

Co-authored-by: Curtis Kapsak <[email protected]>
@sage-wright sage-wright changed the title [TBProfiler_tNGS_PHB] Introduction of tNGS workflow for TB TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB Dec 15, 2023
@sage-wright sage-wright changed the base branch from smw-tb-2023-10-25-dev to main December 15, 2023 18:54
@sage-wright sage-wright linked an issue Dec 19, 2023 that may be closed by this pull request
@sage-wright sage-wright marked this pull request as ready for review April 15, 2024 13:56
@sage-wright sage-wright requested a review from cimendes April 15, 2024 13:56
@cimendes cimendes merged commit 9f6ff94 into main Apr 15, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TBProfiler tNGS workflow
3 participants