Skip to content

Commit

Permalink
Merge branch 'main' into smw-pipefail-dev
Browse files Browse the repository at this point in the history
  • Loading branch information
sage-wright authored Nov 8, 2024
2 parents 7894c64 + 2fd9f75 commit 46a27f9
Show file tree
Hide file tree
Showing 24 changed files with 131 additions and 57 deletions.
4 changes: 4 additions & 0 deletions docs/workflows/genomic_characterization/freyja.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,12 +327,16 @@ The main output file used in subsequent Freyja workflows is found under the `fre
| bwa_version | String | Version of BWA used to map read data to the reference genome | PE, SE |
| fastp_html_report | File | The HTML report made with fastp | PE, SE |
| fastp_version | String | Version of fastp software used | PE, SE |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length | PE, SE |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length | PE |
| fastq_scan_num_reads_clean_pairs | String | Number of clean read pairs | PE |
| fastq_scan_num_reads_clean1 | Int | Number of clean forward reads | PE, SE |
| fastq_scan_num_reads_clean2 | Int | Number of clean reverse reads | PE |
| fastq_scan_num_reads_raw_pairs | String | Number of raw read pairs | PE |
| fastq_scan_num_reads_raw1 | Int | Number of raw forward reads | PE, SE |
| fastq_scan_num_reads_raw2 | Int | Number of raw reverse reads | PE |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length | PE, SE |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length | PE |
| fastq_scan_version | String | Version of fastq_scan used for read QC analysis | PE, SE |
| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser | PE, SE |
| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser | PE |
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/genomic_characterization/theiacov.md
Original file line number Diff line number Diff line change
Expand Up @@ -1026,6 +1026,8 @@ All TheiaCoV Workflows (not TheiaCoV_FASTA_Batch)
| est_percent_gene_coverage_tsv | File | Percent coverage for each gene in the organism being analyzed (depending on the organism input) | CL, ONT, PE, SE |
| fastp_html_report | File | HTML report for fastp | PE, SE |
| fastp_version | String | Fastp version used | PE, SE |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length | PE, SE, CL |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length | PE |
| fastq_scan_num_reads_clean_pairs | String | Number of paired reads after filtering as determined by fastq_scan | PE |
| fastq_scan_num_reads_clean1 | Int | Number of forward reads after filtering as determined by fastq_scan | CL, PE, SE |
| fastq_scan_num_reads_clean2 | Int | Number of reverse reads after filtering as determined by fastq_scan | PE |
Expand All @@ -1036,6 +1038,8 @@ All TheiaCoV Workflows (not TheiaCoV_FASTA_Batch)
| fastq_scan_r1_mean_q_raw | Float | Forward read mean quality value before quality trimming and adapter removal | |
| fastq_scan_r1_mean_readlength_clean | Float | Forward read mean read length value after quality trimming and adapter removal | |
| fastq_scan_r1_mean_readlength_raw | Float | Forward read mean read length value before quality trimming and adapter removal | |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length | PE, SE, CL |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length | PE |
| fastq_scan_version | String | Version of fastq_scan used for read QC analysis | CL, PE, SE |
| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser | PE, SE |
| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser | PE |
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/genomic_characterization/theiaeuk.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,10 @@ The TheiaEuk workflow automatically activates taxa-specific tasks after identifi
| cg_pipeline_report | File | TSV file of read metrics from raw reads, including average read length, number of reads, and estimated genome coverage |
| est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
| est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length |
| r1_mean_q_clean | Float | Mean quality score of clean forward reads |
| r1_mean_q_raw | Float | Mean quality score of raw forward reads |
| r2_mean_q_clean | Float | Mean quality score of clean reverse reads |
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/genomic_characterization/theiameta.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,12 +295,16 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge
| fastp_html_report | File | Report file for fastp in HTML format |
| fastp_version | String | Version of fastp used |
| fastq_scan_docker | String | Docker image of fastq_scan |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length |
| fastq_scan_num_reads_clean_pairs | String | Number of read pairs after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_clean1 | Int | Number of forward reads after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_clean2 | Int | Number of reverse reads after cleaning as calculated by fastq_scan |
| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs as calculated by fastq_scan |
| fastq_scan_num_reads_raw1 | Int | Number of input forward reads as calculated by fastq_scan |
| fastq_scan_num_reads_raw2 | Int | Number of input reserve reads as calculated by fastq_scan |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length |
| fastq_scan_version | String | fastq_scan version |
| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser |
| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
Expand Down
4 changes: 4 additions & 0 deletions docs/workflows/genomic_characterization/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -1731,12 +1731,16 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after
| est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length | ONT, PE, SE |
| fastp_html_report | File | The HTML report made with fastp | PE, SE |
| fastp_version | String | Version of fastp software used | PE, SE |
| fastq_scan_clean1_json | File | JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length | PE, SE |
| fastq_scan_clean2_json | File | JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length | PE |
| fastq_scan_num_reads_clean_pairs | String | Number of read pairs after cleaning as calculated by fastq_scan | PE |
| fastq_scan_num_reads_clean1 | Int | Number of forward reads after cleaning as calculated by fastq_scan | PE, SE |
| fastq_scan_num_reads_clean2 | Int | Number of reverse reads after cleaning as calculated by fastq_scan | PE |
| fastq_scan_num_reads_raw_pairs | String | Number of input read pairs calculated by fastq_scan | PE |
| fastq_scan_num_reads_raw1 | Int | Number of input forward reads calculated by fastq_scan | PE, SE |
| fastq_scan_num_reads_raw2 | Int | Number of input reverse reads calculated by fastq_scan | PE |
| fastq_scan_raw1_json | File | JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length | PE, SE |
| fastq_scan_raw2_json | File | JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length | PE |
| fastq_scan_version | String | Version of fastq-scan software used | PE, SE |
| fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser | PE, SE |
| fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser | PE |
Expand Down
65 changes: 41 additions & 24 deletions tasks/quality_control/basic_statistics/task_fastq_scan.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@ task fastq_scan_pe {
File read2
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
String read2_name = basename(basename(basename(read2, ".gz"), ".fastq"), ".fq")
Int disk_size = 100
String docker = "quay.io/biocontainers/fastq-scan:0.4.4--h7d875b9_1"
Int disk_size = 50
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
Int memory = 2
Int cpu = 2
Int cpu = 1
}
command <<<
# capture date and version
date | tee DATE
# exit task in case anything fails in one-liners or variables are unset
set -euo pipefail

# capture version
fastq-scan -v | tee VERSION

# set cat command based on compression
Expand All @@ -24,11 +26,21 @@ task fastq_scan_pe {
fi

# capture forward read stats
echo "DEBUG: running fastq-scan on $(basename ~{read1})"
eval "${cat_reads} ~{read1}" | fastq-scan | tee ~{read1_name}_fastq-scan.json
cat ~{read1_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ1_SEQS
# using simple redirect so STDOUT is not confusing
jq .qc_stats.read_total ~{read1_name}_fastq-scan.json > READ1_SEQS
echo "DEBUG: number of reads in $(basename ~{read1}): $(cat READ1_SEQS)"
read1_seqs=$(cat READ1_SEQS)
echo

# capture reverse read stats
echo "DEBUG: running fastq-scan on $(basename ~{read2})"
eval "${cat_reads} ~{read2}" | fastq-scan | tee ~{read2_name}_fastq-scan.json
cat ~{read2_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ2_SEQS

# using simple redirect so STDOUT is not confusing
jq .qc_stats.read_total ~{read2_name}_fastq-scan.json > READ2_SEQS
echo "DEBUG: number of reads in $(basename ~{read2}): $(cat READ2_SEQS)"
read2_seqs=$(cat READ2_SEQS)

# capture number of read pairs
Expand All @@ -37,26 +49,27 @@ task fastq_scan_pe {
else
read_pairs="Uneven pairs: R1=${read1_seqs}, R2=${read2_seqs}"
fi

echo $read_pairs | tee READ_PAIRS

# use simple redirect so STDOUT is not confusing
echo "$read_pairs" > READ_PAIRS
echo "DEBUG: number of read pairs: $(cat READ_PAIRS)"
>>>
output {
File read1_fastq_scan_report = "~{read1_name}_fastq-scan.json"
File read2_fastq_scan_report = "~{read2_name}_fastq-scan.json"
File read1_fastq_scan_json = "~{read1_name}_fastq-scan.json"
File read2_fastq_scan_json = "~{read2_name}_fastq-scan.json"
Int read1_seq = read_int("READ1_SEQS")
Int read2_seq = read_int("READ2_SEQS")
String read_pairs = read_string("READ_PAIRS")
String version = read_string("VERSION")
String pipeline_date = read_string("DATE")
String fastq_scan_docker = docker
}
runtime {
docker: docker
memory: memory + " GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB" # TES
preemptible: 0
disk: disk_size + " GB"
preemptible: 1
maxRetries: 3
}
}
Expand All @@ -65,14 +78,16 @@ task fastq_scan_se {
input {
File read1
String read1_name = basename(basename(basename(read1, ".gz"), ".fastq"), ".fq")
Int disk_size = 100
Int disk_size = 50
Int memory = 2
Int cpu = 2
String docker = "quay.io/biocontainers/fastq-scan:0.4.4--h7d875b9_1"
Int cpu = 1
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3"
}
command <<<
# capture date and version
date | tee DATE
# exit task in case anything fails in one-liners or variables are unset
set -euo pipefail

# capture version
fastq-scan -v | tee VERSION

# set cat command based on compression
Expand All @@ -83,23 +98,25 @@ task fastq_scan_se {
fi

# capture forward read stats
echo "DEBUG: running fastq-scan on $(basename ~{read1})"
eval "${cat_reads} ~{read1}" | fastq-scan | tee ~{read1_name}_fastq-scan.json
cat ~{read1_name}_fastq-scan.json | jq .qc_stats.read_total | tee READ1_SEQS
# using simple redirect so STDOUT is not confusing
jq .qc_stats.read_total ~{read1_name}_fastq-scan.json > READ1_SEQS
echo "DEBUG: number of reads in $(basename ~{read1}): $(cat READ1_SEQS)"
>>>
output {
File fastq_scan_report = "~{read1_name}_fastq-scan.json"
File fastq_scan_json = "~{read1_name}_fastq-scan.json"
Int read1_seq = read_int("READ1_SEQS")
String version = read_string("VERSION")
String pipeline_date = read_string("DATE")
String fastq_scan_docker = docker
}
runtime {
docker: docker
memory: memory + " GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB" # TES
preemptible: 0
disk: disk_size + " GB"
preemptible: 1
maxRetries: 3
}
}
12 changes: 10 additions & 2 deletions tasks/utilities/data_export/task_broad_terra_tools.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ task export_taxon_tables {
Int? num_reads_raw2
String? num_reads_raw_pairs
String? fastq_scan_version
File? fastq_scan_raw1_json
File? fastq_scan_raw2_json
File? fastq_scan_clean1_json
File? fastq_scan_clean2_json
Int? num_reads_clean1
Int? num_reads_clean2
String? num_reads_clean_pairs
Expand Down Expand Up @@ -391,7 +395,7 @@ task export_taxon_tables {
}
command <<<
set -euo pipefail

# capture taxon and corresponding table names from input taxon_tables
taxon_array=($(cut -f1 ~{taxon_tables} | tail +2))
echo "Taxon array: ${taxon_array[*]}"
Expand Down Expand Up @@ -447,6 +451,10 @@ task export_taxon_tables {
"num_reads_raw2": "~{num_reads_raw2}",
"num_reads_raw_pairs": "~{num_reads_raw_pairs}",
"fastq_scan_version": "~{fastq_scan_version}",
"fastq_scan_raw1_json": "~{fastq_scan_raw1_json}",
"fastq_scan_raw2_json": "~{fastq_scan_raw2_json}",
"fastq_scan_clean1_json": "~{fastq_scan_clean1_json}",
"fastq_scan_clean2_json": "~{fastq_scan_clean2_json}",
"num_reads_clean1": "~{num_reads_clean1}",
"num_reads_clean2": "~{num_reads_clean2}",
"num_reads_clean_pairs": "~{num_reads_clean_pairs}",
Expand Down Expand Up @@ -779,7 +787,7 @@ task export_taxon_tables {
"agrvate_version": "~{agrvate_version}",
"agrvate_docker": "~{agrvate_docker}",
"srst2_vibrio_detailed_tsv": "~{srst2_vibrio_detailed_tsv}",
"srst2_vibrio_version": "~{srst2_vibrio_version}",~
"srst2_vibrio_version": "~{srst2_vibrio_version}",
"srst2_vibrio_docker": "~{srst2_vibrio_docker}",
"srst2_vibrio_database": "~{srst2_vibrio_database}",
"srst2_vibrio_ctxA": "~{srst2_vibrio_ctxA}",
Expand Down
1 change: 0 additions & 1 deletion tests/config/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: pytest-env-CI
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python >=3.7
- cromwell=86
Expand Down
10 changes: 4 additions & 6 deletions tests/workflows/theiacov/test_wf_theiacov_clearlabs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,17 +115,16 @@
- path: miniwdl_run/call-fastq_scan_clean_reads/inputs.json
contains: ["read1", "clearlabs"]
- path: miniwdl_run/call-fastq_scan_clean_reads/outputs.json
contains: ["fastq_scan_se", "pipeline_date", "read1_seq"]
contains: ["fastq_scan_se", "read1_seq"]
- path: miniwdl_run/call-fastq_scan_clean_reads/stderr.txt
- path: miniwdl_run/call-fastq_scan_clean_reads/stderr.txt.offset
- path: miniwdl_run/call-fastq_scan_clean_reads/stdout.txt
- path: miniwdl_run/call-fastq_scan_clean_reads/task.log
contains: ["wdl", "theiacov_clearlabs", "fastq_scan_clean_reads", "done"]
- path: miniwdl_run/call-fastq_scan_clean_reads/work/DATE
- path: miniwdl_run/call-fastq_scan_clean_reads/work/READ1_SEQS
md5sum: 097e79b36919c8377c56088363e3d8b7
- path: miniwdl_run/call-fastq_scan_clean_reads/work/VERSION
md5sum: 8e4e9cdfbacc9021a3175ccbbbde002b
md5sum: a59bb42644e35c09b8fa8087156fa4c2
- path: miniwdl_run/call-fastq_scan_clean_reads/work/_miniwdl_inputs/0/clearlabs_R1_dehosted.fastq.gz
- path: miniwdl_run/call-fastq_scan_clean_reads/work/clearlabs_R1_dehosted_fastq-scan.json
md5sum: 869dd2e934c600bba35f30f08e2da7c9
Expand All @@ -134,17 +133,16 @@
- path: miniwdl_run/call-fastq_scan_raw_reads/inputs.json
contains: ["read1", "clearlabs"]
- path: miniwdl_run/call-fastq_scan_raw_reads/outputs.json
contains: ["fastq_scan_se", "pipeline_date", "read1_seq"]
contains: ["fastq_scan_se", "read1_seq"]
- path: miniwdl_run/call-fastq_scan_raw_reads/stderr.txt
- path: miniwdl_run/call-fastq_scan_raw_reads/stderr.txt.offset
- path: miniwdl_run/call-fastq_scan_raw_reads/stdout.txt
- path: miniwdl_run/call-fastq_scan_raw_reads/task.log
contains: ["wdl", "theiacov_clearlabs", "fastq_scan_raw_reads", "done"]
- path: miniwdl_run/call-fastq_scan_raw_reads/work/DATE
- path: miniwdl_run/call-fastq_scan_raw_reads/work/READ1_SEQS
md5sum: 097e79b36919c8377c56088363e3d8b7
- path: miniwdl_run/call-fastq_scan_raw_reads/work/VERSION
md5sum: 8e4e9cdfbacc9021a3175ccbbbde002b
md5sum: a59bb42644e35c09b8fa8087156fa4c2
- path: miniwdl_run/call-fastq_scan_raw_reads/work/_miniwdl_inputs/0/clearlabs.fastq.gz
- path: miniwdl_run/call-fastq_scan_raw_reads/work/clearlabs_fastq-scan.json
md5sum: 869dd2e934c600bba35f30f08e2da7c9
Expand Down
5 changes: 2 additions & 3 deletions tests/workflows/theiacov/test_wf_theiacov_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@
md5sum: d41d8cd98f00b204e9800998ecf8427e
# fastq scan raw
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/command
md5sum: 9b2cc0107f1a90972482d7b3a658d242
md5sum: 56bcc1ba5d2a9c94f4704fc4b8e6b7ba
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/inputs.json
contains: ["read1", "read2", "illumina_pe"]
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/outputs.json
contains: ["fastq_scan_pe", "pipeline_date", "read1_seq", "read2_seq"]
contains: ["fastq_scan_pe", "read1_seq", "read2_seq"]
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/stderr.txt
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/stderr.txt.offset
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/stdout.txt
Expand All @@ -74,7 +74,6 @@
md5sum: 2a77387b247176aa5fcc9aed228699c9
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/SRR13687078_2_fastq-scan.json
md5sum: d0eebdd4e14cf0a0b371fee1338474c9
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/DATE
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/READ1_SEQS
md5sum: 4e4a08422dbf7001fd09ad5126e13b44
- path: miniwdl_run/call-read_QC_trim/call-fastq_scan_raw/work/READ2_SEQS
Expand Down
Loading

0 comments on commit 46a27f9

Please sign in to comment.