Skip to content

Commit

Permalink
[TheiaMeta] Midas call in read_QC_trim_pe.wdl workflow and outputs (#619
Browse files Browse the repository at this point in the history
)

* add midas outputs to theiameta-illumina

* conditional logic for midas to run theiameta

* update syntax for conditional statement

* update documentation outputs theiameta

* remove sec midas outputs from task wf and docs

* theiameta.md docs clearer description

* sec genus back into qc trim as needed by theiaprok

* update mdsums
  • Loading branch information
fraser-combe authored Oct 7, 2024
1 parent bb3f9c2 commit 26c0ddd
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 6 deletions.
7 changes: 3 additions & 4 deletions docs/workflows/genomic_characterization/theiameta.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,9 +184,8 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge

The `MIDAS` task is for the identification of reads to detect contamination with non-target taxa. This task is optional and turned off by default. It can be used by setting the `call_midas` input variable to `true`.

The MIDAS tool was originally designed for metagenomic sequencing data but has been co-opted for use with bacterial isolate WGS methods. It can be used to detect contamination present in raw sequencing data by estimating bacterial species abundance in bacterial isolate WGS data. If a secondary genus is detected above a relative frequency of 0.01 (1%), then the sample should fail QC and be investigated further for potential contamination.
The MIDAS reference database, located at **`gs://theiagen-large-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz`**, is provided as the default. It is possible to provide a custom database. More information is available [here](https://github.com/snayfach/MIDAS/blob/master/docs/ref_db.md).

This task is similar to those used in commercial software, BioNumerics, for estimating secondary species abundance.

??? toggle "How are the MIDAS output columns determined?"
Expand All @@ -207,8 +206,6 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge
- coverage: estimated genome-coverage (i.e. read-depth) of species in metagenome
- relative_abundance: estimated relative abundance of species in metagenome
The value in the `midas_primary_genus` column is derived by ordering the rows in order of "relative_abundance" and identifying the genus of top species in the "species_id" column (Salmonella). The value in the `midas_secondary_genus` column is derived from the genus of the second-most prevalent genus in the "species_id" column (Citrobacter). The `midas_secondary_genus_abundance` column is the "relative_abundance" of the second-most prevalent genus (0.009477003). The `midas_secondary_genus_coverage` is the "coverage" of the second-most prevalent genus (0.995216227).
!!! techdetails "read_QC_trim Technical Details"
| | Links |
Expand Down Expand Up @@ -324,6 +321,8 @@ The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads ge
| largest_contig | Int | Largest contig size |
| metaspades_docker | String | Docker image of metaspades |
| metaspades_version | String | Version of metaspades |
| midas_primary_genus | String | Primary genus detected by MIDAS |
| midas_report | File | MIDAS report file tsv file|
| minimap2_docker | String | Docker image of minimap2 |
| minimap2_version | String | Version of minimap2 |
| ncbi_scrub_docker | String | Docker image for NCBI's HRRT |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,7 +561,7 @@
- path: miniwdl_run/wdl/tasks/gene_typing/drug_resistance/task_resfinder.wdl
md5sum: 27528633723303b462d095b642649453
- path: miniwdl_run/wdl/tasks/gene_typing/variant_detection/task_snippy_variants.wdl
md5sum: 284ce680b52e7e1c1753537b344fa161
md5sum: 3b9e04569d7e856dcc649b7726b306b7
- path: miniwdl_run/wdl/tasks/quality_control/read_filtering/task_bbduk.wdl
md5sum: aec6ef024d6dff31723f44290f6b9cf5
- path: miniwdl_run/wdl/tasks/quality_control/advanced_metrics/task_busco.wdl
Expand Down
3 changes: 3 additions & 0 deletions workflows/theiameta/wf_theiameta_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,9 @@ workflow theiameta_illumina_pe {
String bbduk_docker = read_QC_trim.bbduk_docker
# Read QC - Read stats
Float? average_read_length = read_QC_trim.average_read_length
# MIDAS outputs
String? midas_primary_genus = read_QC_trim.midas_primary_genus
File? midas_report = read_QC_trim.midas_report
# Assembly - metaspades
File assembly_fasta = select_first([retrieve_aligned_contig_paf.final_assembly, pilon.assembly_fasta])
String metaspades_version = metaspades_pe.metaspades_version
Expand Down
2 changes: 1 addition & 1 deletion workflows/utilities/wf_read_QC_trim_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ workflow read_QC_trim_pe {
read2 = bbduk.read2_clean
}
}
if ("~{workflow_series}" == "theiaprok") {
if ("~{workflow_series}" == "theiaprok" || "~{workflow_series}" == "theiameta") {
if (call_midas) {
call midas_task.midas {
input:
Expand Down

0 comments on commit 26c0ddd

Please sign in to comment.