Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB #272

Merged
merged 59 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
afca582
output tbprofiler vcf
sage-wright Oct 17, 2023
0d030e2
update default docker
sage-wright Oct 17, 2023
0251379
fix path
sage-wright Oct 17, 2023
f520261
add sample id to the beginning of the coverage report
sage-wright Oct 25, 2023
ba8d60d
update default docker
sage-wright Oct 25, 2023
c6fac1e
Merge branch 'smw-tb-vcf-dev' into smw-tb-2023-10-25-dev
sage-wright Oct 25, 2023
efafe25
Enable TBProfiler parameter changes (#246)
frankambrosio3 Nov 13, 2023
23008ad
Merge branch 'main' into smw-tb-2023-10-25-dev
sage-wright Nov 13, 2023
a6fc36c
update md5sums
sage-wright Nov 15, 2023
fe5b8a2
caller_options tbprofiler
frankambrosio3 Nov 28, 2023
cd18ce1
caller_options merlin magic
frankambrosio3 Nov 28, 2023
2f2f2ea
--calling_params tbprofiler
frankambrosio3 Nov 28, 2023
e36494c
calling_params tbprofiler
frankambrosio3 Nov 29, 2023
742acaf
quotes around params tbprofiler
frankambrosio3 Nov 29, 2023
3235eb1
added quotes around calling params tbprofiler
frankambrosio3 Nov 29, 2023
af529a7
"-C 1 -F 0.0" tbprof
frankambrosio3 Nov 29, 2023
2dcd5e2
removed caller options
frankambrosio3 Nov 29, 2023
a8ab72a
hardcoded tbprofiler freebayes params
frankambrosio3 Nov 29, 2023
066d643
re-optionalize
sage-wright Nov 30, 2023
7c29fe8
update md5sums
sage-wright Nov 30, 2023
ee34147
draft tbprofiler tngs
sage-wright Dec 11, 2023
1fbe0c1
add versioning
sage-wright Dec 11, 2023
b6343a2
add to dockstore
sage-wright Dec 11, 2023
fb56eb3
commenting out clockwork to try and fix bugs?
sage-wright Dec 13, 2023
7dc63b1
Merge branch 'smw-tb-2023-10-25-dev' into smw-tngs-tbprofiler-dev
sage-wright Dec 15, 2023
b875d1b
chop 30 bp from both sides
sage-wright Jan 11, 2024
bea2573
update workflow to use trimmomatic chop
sage-wright Jan 11, 2024
dcb11a5
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Jan 11, 2024
b570ebb
remove whitespace cruft
sage-wright Jan 11, 2024
836ac6b
merge into regular trimmomatic task
sage-wright Jan 11, 2024
1707c79
change naming
sage-wright Jan 11, 2024
3638674
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Jan 11, 2024
ecd058f
prevent widespread failures
sage-wright Jan 11, 2024
7dd015e
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Jan 25, 2024
3a1b7d9
update to latest version of tbp_parser and enable tngs bed file
sage-wright Jan 25, 2024
4a6b06f
update md5sum
sage-wright Jan 29, 2024
4a11abd
tngs updates
sage-wright Feb 6, 2024
9378e2d
update docker
sage-wright Feb 21, 2024
0fd735c
update docker
sage-wright Mar 4, 2024
5774cc7
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Mar 4, 2024
4f68fa1
udpate paths
sage-wright Mar 4, 2024
72469bd
update docker
sage-wright Mar 8, 2024
f1d0729
enable rrs & rrl frequency changeing
sage-wright Mar 14, 2024
fe59dd1
update docker
sage-wright Mar 19, 2024
f7e74fa
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Mar 20, 2024
d3ca39f
update md5sum
sage-wright Mar 21, 2024
49ae578
Merge branch 'smw-tngs-tbprofiler-dev' of https://github.com/theiagen…
sage-wright Mar 21, 2024
37279e6
write all tbprofiler outputs
sage-wright Mar 21, 2024
999220f
remove comment cruft
sage-wright Mar 21, 2024
fcc535f
remove database for tbprofiler; broken
sage-wright Mar 21, 2024
5a7bc2b
fix issue
sage-wright Mar 21, 2024
d5b0183
update version and allow for user-modifable expert rule regions bed file
sage-wright Mar 25, 2024
049f183
add rpob & etha freq modification params
sage-wright Mar 29, 2024
95bdeae
add new parameters for modification
sage-wright Apr 1, 2024
2e2f05a
optionalize
sage-wright Apr 1, 2024
5da8036
v1.3.9
sage-wright Apr 4, 2024
8851b46
new version!!!!!!1
sage-wright Apr 4, 2024
52ce070
Merge branch 'main' into smw-tngs-tbprofiler-dev
sage-wright Apr 15, 2024
1533712
update md5sums
sage-wright Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,11 @@ workflows:
primaryDescriptorPath: /workflows/utilities/wf_czgenepi_prep.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: TBProfiler_tNGS_PHB
subclass: WDL
primaryDescriptorPath: /workflows/standalone_modules/wf_tbprofiler_tngs.wdl
testParameterFiles:
- /tests/inputs/empty.json
- name: Rename_FASTQ_PHB
subclass: WDL
primaryDescriptorPath: /workflows/utilities/file_handling/wf_rename_fastq_files.wdl
Expand Down
19 changes: 18 additions & 1 deletion tasks/quality_control/read_filtering/task_trimmomatic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ task trimmomatic_pe {
Int trimmomatic_min_length = 75
Int trimmomatic_window_size = 4
Int trimmomatic_quality_trim_score = 30
Int? trimmomatic_base_crop
Int cpu = 4
String trimmomatic_args = "-phred33"
Int disk_size = 100
Expand All @@ -19,13 +20,29 @@ task trimmomatic_pe {
date | tee DATE
trimmomatic -version > VERSION && sed -i -e 's/^/Trimmomatic /' VERSION

CROPPING_VAR=""
# if trimmomatic base chop is defined (-n means not empty), determine average readlength of the input reads
if [ -n "~{trimmomatic_base_crop}" ]; then
# determine the average read length of the input reads
read_length_r1=$(zcat ~{read1} | awk '{if(NR%4==2) {bases+=length($0)} } END {print bases/(NR/4)}')
read_length_r2=$(zcat ~{read2} | awk '{if(NR%4==2) {bases+=length($0)} } END {print bases/(NR/4)}')

# take the average of the two read lengths without using bc and remove the end base chop
avg_readlength=$(python3 -c "print(int(((${read_length_r1} + ${read_length_r2}) / 2) - ~{trimmomatic_base_crop}))")

# HEADCROP: number of bases to remove from the start of the read
# CROP: number of bases to KEEP, from the start of the read
CROPPING_VAR="HEADCROP:~{trimmomatic_base_crop} CROP:$avg_readlength"
fi

trimmomatic PE \
~{trimmomatic_args} \
-threads ~{cpu} \
~{read1} ~{read2} \
-baseout ~{samplename}.fastq.gz \
SLIDINGWINDOW:~{trimmomatic_window_size}:~{trimmomatic_quality_trim_score} \
MINLEN:~{trimmomatic_min_length} &> ~{samplename}.trim.stats.txt
MINLEN:~{trimmomatic_min_length} &> ~{samplename}.trim.stats.txt \
"${CROPPING_VAR}"

>>>
output {
Expand Down
1 change: 1 addition & 0 deletions tasks/species_typing/mycobacterium/task_clockwork.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ task clockwork_decon_reads {
output {
File clockwork_cleaned_read1 = "clockwork_cleaned_~{samplename}_R1.fastq.gz"
File clockwork_cleaned_read2 = "clockwork_cleaned_~{samplename}_R2.fastq.gz"
String clockwork_version = read_string("VERSION")
}
runtime {
docker: docker
Expand Down
36 changes: 30 additions & 6 deletions tasks/species_typing/mycobacterium/task_tbp_parser.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,24 @@ task tbp_parser {

String? sequencing_method
String? operator
Int min_depth = 10
Int coverage_threshold = 100
Int? min_depth # default 10
Int? coverage_threshold # default 100 (--min_percent_coverage)
File? coverage_regions_bed
Float? min_frequency # default 0.1
Int? min_read_support # default 10

Boolean tbp_parser_debug = false

String docker = "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.1.8"
Boolean tngs_data = false
Float? rrs_frequency # default 0.1
Int? rrs_read_support # default 10
Float? rrl_frequency # default 0.1
Int? rrl_read_support # default 10
Float? rpob449_frequency # default 0.1
Float? etha237_frequency # default 0.1
File? expert_rule_regions_bed

String docker = "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.4.0"
Int disk_size = 100
Int memory = 4
Int cpu = 1
Expand All @@ -27,9 +40,20 @@ task tbp_parser {
~{"--sequencing_method " + sequencing_method} \
~{"--operator " + operator} \
~{"--min_depth " + min_depth} \
~{"--coverage_threshold " + coverage_threshold} \
~{"--min_percent_coverage " + coverage_threshold} \
~{"--coverage_regions " + coverage_regions_bed} \
~{"--min_frequency " + min_frequency} \
~{"--min_read_support " + min_read_support} \
~{"--tngs_expert_regions " + expert_rule_regions_bed} \
~{"--rrs_frequency " + rrs_frequency} \
~{"--rrs_read_support " + rrs_read_support} \
~{"--rrl_frequency " + rrl_frequency} \
~{"--rrl_read_support " + rrl_read_support} \
~{"--rpob449_frequency " + rpob449_frequency} \
~{"--etha237_frequency " + etha237_frequency} \
--output_prefix ~{samplename} \
~{true="--debug" false="--verbose" tbp_parser_debug}
~{true="--debug" false="--verbose" tbp_parser_debug} \
~{true="--tngs" false="" tngs_data}

# set default genome percent coverage and average depth to 0 to prevent failures
echo 0.0 > GENOME_PC
Expand All @@ -43,7 +67,7 @@ task tbp_parser {
samtools depth -J ~{tbprofiler_bam} | awk -F "\t" '{sum+=$3} END { print sum/NR }' | tee AVG_DEPTH

# add sample id to the beginning of the coverage report
awk '{print "~{samplename},"$0}' ~{samplename}.percent_gene_coverage.csv > tmp.csv && mv -f tmp.csv ~{samplename}.percent_gene_coverage.csv
awk '{s=(NR==1)?"Sample_accession_number,":"~{samplename},"; $0=s$0}1' ~{samplename}.percent_gene_coverage.csv > tmp.csv && mv -f tmp.csv ~{samplename}.percent_gene_coverage.csv
>>>
output {
File tbp_parser_looker_report_csv = "~{samplename}.looker_report.csv"
Expand Down
1 change: 0 additions & 1 deletion tasks/species_typing/mycobacterium/task_tbprofiler.wdl
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
version 1.0

task tbprofiler {
# Inputs
input {
File read1
File? read2
Expand Down
8 changes: 4 additions & 4 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -466,7 +466,7 @@
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/_miniwdl_inputs/0/SRR2838702_R1.fastq.gz
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/_miniwdl_inputs/0/SRR2838702_R2.fastq.gz
- path: miniwdl_run/call-read_QC_trim/call-trimmomatic_pe/command
md5sum: f1b81c5b16649da363f2cc86eae75398
md5sum: cc137a029d5143592b40edf01d53735f
- path: miniwdl_run/call-read_QC_trim/call-trimmomatic_pe/inputs.json
contains: ["read", "fastq", "test", "trimmomatic_min_length"]
- path: miniwdl_run/call-read_QC_trim/call-trimmomatic_pe/outputs.json
Expand Down Expand Up @@ -619,7 +619,7 @@
- path: miniwdl_run/wdl/tasks/species_typing/escherichia_shigella/task_sonneityping.wdl
md5sum: 8571571f99487448218469c15b803626
- path: miniwdl_run/wdl/tasks/species_typing/mycobacterium/task_tbprofiler.wdl
md5sum: 860ffa77063196e04c539f5dadd23b85
md5sum: a7ab31fd0bc2695ceeaaecd40a4663d5
- path: miniwdl_run/wdl/tasks/species_typing/multi/task_ts_mlst.wdl
md5sum: c84e37ca77732b2ffa6a32629a4b6d6f
- path: miniwdl_run/wdl/tasks/task_versioning.wdl
Expand All @@ -633,9 +633,9 @@
- path: miniwdl_run/wdl/tasks/utilities/data_export/task_broad_terra_tools.wdl
md5sum: ea141ba65f2948ae2abed7ca791e872b
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
md5sum: c28f17110081694ac36a7695fdebea76
md5sum: 4da30e83b782ab63fd9a96d1c77f5f61
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: cfb407b32bc9436a0f12e29dc2e3b5a1
md5sum: b9aa69647ff3a4661621f531104034aa
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_pe.wdl
contains: ["version", "QC", "output"]
- path: miniwdl_run/workflow.log
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_se.yml
Original file line number Diff line number Diff line change
Expand Up @@ -582,7 +582,7 @@
- path: miniwdl_run/wdl/tasks/species_typing/escherichia_shigella/task_sonneityping.wdl
md5sum: 8571571f99487448218469c15b803626
- path: miniwdl_run/wdl/tasks/species_typing/mycobacterium/task_tbprofiler.wdl
md5sum: 860ffa77063196e04c539f5dadd23b85
md5sum: a7ab31fd0bc2695ceeaaecd40a4663d5
- path: miniwdl_run/wdl/tasks/species_typing/multi/task_ts_mlst.wdl
md5sum: c84e37ca77732b2ffa6a32629a4b6d6f
- path: miniwdl_run/wdl/tasks/task_versioning.wdl
Expand All @@ -598,7 +598,7 @@
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_se.wdl
md5sum: f920448f13018432be074e3ef06ea4ee
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: cfb407b32bc9436a0f12e29dc2e3b5a1
md5sum: b9aa69647ff3a4661621f531104034aa
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_se.wdl
md5sum: 7e8df4190a823760467f388bde689bd4
- path: miniwdl_run/workflow.log
Expand Down
89 changes: 89 additions & 0 deletions workflows/standalone_modules/wf_tbprofiler_tngs.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
version 1.0

import "../../tasks/quality_control/read_filtering/task_trimmomatic.wdl" as trimmomatic_task
import "../../tasks/species_typing/mycobacterium/task_tbprofiler.wdl" as tbprofiler_task
import "../../tasks/species_typing/mycobacterium/task_tbp_parser.wdl" as tbp_parser_task
import "../../tasks/task_versioning.wdl" as versioning

workflow tbprofiler_tngs {
meta {
description: "Runs trimmomatic QC, tbprofiler, and tbp-parser on tNGS TB data"
}
input {
File read1
File read2
String samplename
Int bases_to_crop = 30
}
call versioning.version_capture {
input:
}
call trimmomatic_task.trimmomatic_pe {
input:
read1 = read1,
read2 = read2,
samplename = samplename,
trimmomatic_base_crop = bases_to_crop
}
# call clockwork_task.clockwork_decon_reads {
# input:
# read1 = trimmomatic_pe.read1_trimmed,
# read2 = trimmomatic_pe.read2_trimmed,
# samplename = samplename
# }
call tbprofiler_task.tbprofiler {
input:
# read1 = clockwork_decon_reads.clockwork_cleaned_read1,
# read2 = clockwork_decon_reads.clockwork_cleaned_read2,
read1 = trimmomatic_pe.read1_trimmed,
read2 = trimmomatic_pe.read2_trimmed,
samplename = samplename
}
call tbp_parser_task.tbp_parser {
input:
tbprofiler_json = tbprofiler.tbprofiler_output_json,
tbprofiler_bam = tbprofiler.tbprofiler_output_bam,
tbprofiler_bai = tbprofiler.tbprofiler_output_bai,
samplename = samplename,
tngs_data = true
}
output {
# trimmomatic outputs
File trimmomatic_read1_trimmed = trimmomatic_pe.read1_trimmed
File trimmomatic_read2_trimmed = trimmomatic_pe.read2_trimmed
File trimmomatic_stats = trimmomatic_pe.trimmomatic_stats
String trimmomatic_version = trimmomatic_pe.version
String trimmomatic_docker = trimmomatic_pe.trimmomatic_docker
# clockwork outputs
# File clockwork_cleaned_read1 = clockwork_decon_reads.clockwork_cleaned_read1
# File clockwork_cleaned_read2 = clockwork_decon_reads.clockwork_cleaned_read2
# String clockwork_version = clockwork_decon_reads.clockwork_version
# tbprofiler outputs
File tbprofiler_report_csv = tbprofiler.tbprofiler_output_csv
File tbprofiler_report_tsv = tbprofiler.tbprofiler_output_tsv
File tbprofiler_report_json = tbprofiler.tbprofiler_output_json
File tbprofiler_output_alignment_bam = tbprofiler.tbprofiler_output_bam
File tbprofiler_output_alignment_bai = tbprofiler.tbprofiler_output_bai
String tbprofiler_version = tbprofiler.version
String tbprofiler_main_lineage = tbprofiler.tbprofiler_main_lineage
String tbprofiler_sub_lineage = tbprofiler.tbprofiler_sub_lineage
String tbprofiler_dr_type = tbprofiler.tbprofiler_dr_type
String tbprofiler_num_dr_variants = tbprofiler.tbprofiler_num_dr_variants
String tbprofiler_num_other_variants = tbprofiler.tbprofiler_num_other_variants
String tbprofiler_resistance_genes = tbprofiler.tbprofiler_resistance_genes
Int tbprofiler_median_coverage = tbprofiler.tbprofiler_median_coverage
Float tbprofiler_pct_reads_mapped = tbprofiler.tbprofiler_pct_reads_mapped
# tbp_parser outputs
File tbp_parser_looker_report_csv = tbp_parser.tbp_parser_looker_report_csv
File tbp_parser_laboratorian_report_csv = tbp_parser.tbp_parser_laboratorian_report_csv
File tbp_parser_lims_report_csv = tbp_parser.tbp_parser_lims_report_csv
File tbp_parser_coverage_report = tbp_parser.tbp_parser_coverage_report
Float tbp_parser_genome_percent_coverage = tbp_parser.tbp_parser_genome_percent_coverage
Float tbp_parser_average_genome_depth = tbp_parser.tbp_parser_average_genome_depth
String tbp_parser_version = tbp_parser.tbp_parser_version
String tbp_parser_docker = tbp_parser.tbp_parser_docker
# version capture outputs
String tbprofiler_tngs_wf_analysis_date = version_capture.date
String tbprofiler_tngs_wf_version = version_capture.phb_version
}
}
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -870,6 +870,10 @@ workflow theiaprok_illumina_pe {
String? tbprofiler_sub_lineage = merlin_magic.tbprofiler_sub_lineage
String? tbprofiler_dr_type = merlin_magic.tbprofiler_dr_type
String? tbprofiler_resistance_genes = merlin_magic.tbprofiler_resistance_genes
Int? tbprofiler_median_coverage = merlin_magic.tbprofiler_median_coverage
Float? tbprofiler_pct_reads_mapped = merlin_magic.tbprofiler_pct_reads_mapped
String? tbp_parser_version = merlin_magic.tbp_parser_version
String? tbp_parser_docker = merlin_magic.tbp_parser_docker
File? tbp_parser_lims_report_csv = merlin_magic.tbp_parser_lims_report_csv
File? tbp_parser_looker_report_csv = merlin_magic.tbp_parser_looker_report_csv
File? tbp_parser_laboratorian_report_csv = merlin_magic.tbp_parser_laboratorian_report_csv
Expand Down
4 changes: 4 additions & 0 deletions workflows/theiaprok/wf_theiaprok_ont.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -760,6 +760,10 @@ workflow theiaprok_ont {
String? tbprofiler_sub_lineage = merlin_magic.tbprofiler_sub_lineage
String? tbprofiler_dr_type = merlin_magic.tbprofiler_dr_type
String? tbprofiler_resistance_genes = merlin_magic.tbprofiler_resistance_genes
Int? tbprofiler_median_coverage = merlin_magic.tbprofiler_median_coverage
Float? tbprofiler_pct_reads_mapped = merlin_magic.tbprofiler_pct_reads_mapped
String? tbp_parser_version = merlin_magic.tbp_parser_version
String? tbp_parser_docker = merlin_magic.tbp_parser_docker
File? tbp_parser_lims_report_csv = merlin_magic.tbp_parser_lims_report_csv
File? tbp_parser_looker_report_csv = merlin_magic.tbp_parser_looker_report_csv
File? tbp_parser_laboratorian_report_csv = merlin_magic.tbp_parser_laboratorian_report_csv
Expand Down
2 changes: 1 addition & 1 deletion workflows/utilities/wf_merlin_magic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ workflow merlin_magic {
String? tbprofiler_variant_caller
String? tbprofiler_variant_calling_params
Boolean tbprofiler_run_custom_db = false
File tbprofiler_custom_db = "gs://theiagen-public-files/terra/theiaprok-files/tbdb_varpipe_combined_nodups.tar.gz"
File? tbprofiler_custom_db
Boolean tbprofiler_additional_outputs = false
String tbp_parser_output_seq_method_type = "WGS"
String? tbp_parser_operator
Expand Down