Merge pull request galaxyproject#14 from bernt-matthias/topic/dada2

dada2: add dada2 workflow for paired end data
Delphine-L · Mar 11, 2024 · 3c15860 · 3c15860
2 parents 242a145 + df10151
commit 3c15860
Show file tree

Hide file tree

Showing 7 changed files with 1,140 additions and 0 deletions.
diff --git a/workflows/amplicon/dada2/.dockstore.yml b/workflows/amplicon/dada2/.dockstore.yml
@@ -0,0 +1,12 @@
+version: 1.2
+workflows:
+- name: main
+  subclass: Galaxy
+  publish: true
+  primaryDescriptorPath: /dada2_paired.ga
+  testParameterFiles:
+  - /dada2_paired-tests.yml
+  authors:
+  - name: Matthias Bernt
+    orcid: 0000-0003-3763-0797
+  - name: UFZ Leipzig
diff --git a/workflows/amplicon/dada2/CHANGELOG.md b/workflows/amplicon/dada2/CHANGELOG.md
@@ -0,0 +1,4 @@
+# Changelog
+
+## [0.1] 2024-01-09
+First release.
diff --git a/workflows/amplicon/dada2/README.md b/workflows/amplicon/dada2/README.md
@@ -0,0 +1,37 @@
+# Dada2: amplicon analysis for paired end data
+
+## Inputs dataset
+
+- `Paired input data` paired input collection in FASTQ format
+
+## Inputs values
+
+- `Read length forward/reverse reads` length of the forward/reverse reads to which they should be truncated in the filter and trim step
+- `Pool samples` pooling may increase sensitivity
+- `Reference database` that should be used for taxonomic assignment
+
+## Processing
+
+The workflow follows the steps described in the [dada2 tutorial](https://benjjneb.github.io/dada2/tutorial.html).
+
+As a first step the input collection is sorted. This is important because the dada2 step outputs
+a collection in sorted order. If the input collection would not be sorted then the mergePairs step
+samples would be mixed up.
+
+- `FilterAndTrim` Quality control by filtering and trimming reads
+- `QualityProfile` is called before and after the FilterAndTrim step
+- `Unzip Collection` separates forward and reverse reads (the next steps are evaluated separately on forward and reverse reads)
+- `learnErrors` learn error rates
+- `dada` filter noisy reads
+- `mergePairs` merge forward and reverse reads
+- `makeSequenceTable` create the sequence table
+- `removeBimeraDenovo` remove chimeric sequencs
+- `assignTaxonomy` assign taxonomic information from a reference data base
+
+## TODO
+
+Some possibilities to extend/improve the workflow
+
+- output BIOM
+- use ASV1, ... in sequence table and taxonomy output, and output additional fasta
+- allow to use custom taxonomy / make it optional
diff --git a/workflows/amplicon/dada2/dada2_paired-tests.yml b/workflows/amplicon/dada2/dada2_paired-tests.yml
@@ -0,0 +1,86 @@
+- doc: Test outline for dada---paired
+  job:
+    Paired input data:
+      class: Collection
+      collection_type: list:paired
+      elements:
+      - class: Collection
+        type: paired
+        identifier: F3D0
+        elements:
+        - class: File
+          identifier: forward
+          location: https://zenodo.org/record/800651/files/F3D0_R1.fastq
+        - class: File
+          identifier: reverse
+          location: https://zenodo.org/record/800651/files/F3D0_R2.fastq
+      - class: Collection
+        type: paired
+        identifier: F3D5
+        elements:
+        - class: File
+          identifier: forward
+          location: https://zenodo.org/record/800651/files/F3D5_R1.fastq
+        - class: File
+          identifier: reverse
+          location: https://zenodo.org/record/800651/files/F3D5_R2.fastq
+      - class: Collection
+        type: paired
+        identifier: F3D145
+        elements:
+        - class: File
+          identifier: forward
+          location: https://zenodo.org/record/800651/files/F3D145_R1.fastq
+        - class: File
+          identifier: reverse
+          location: https://zenodo.org/record/800651/files/F3D145_R2.fastq
+      - class: Collection
+        type: paired
+        identifier: F3D150
+        elements:
+        - class: File
+          identifier: forward
+          location: https://zenodo.org/record/800651/files/F3D150_R1.fastq
+        - class: File
+          identifier: reverse
+          location: https://zenodo.org/record/800651/files/F3D150_R2.fastq
+      - class: Collection
+        type: paired
+        identifier: Mock
+        elements:
+        - class: File
+          identifier: forward
+          location: https://zenodo.org/record/800651/files/Mock_R1.fastq
+        - class: File
+          identifier: reverse
+          location: https://zenodo.org/record/800651/files/Mock_R2.fastq
+    Read length forward read: 240
+    Read length reverse read: 160
+    Pool samples: 'FALSE'
+    Cached reference database: silva_132
+  outputs:
+    Sequence Table:
+      path: test-data/Sequence Table.dada2_sequencetable
+      asserts:
+        - has_n_columns:
+            n: 6
+        - has_n_lines:
+            n: 171
+    Counts:
+      path: test-data/Counts.tabular
+      asserts:
+        - has_n_columns:
+            n: 8
+        - has_n_lines:
+            n: 6
+    Taxonomy:
+      ftype: tabular
+      sorted: true
+      asserts:
+        - has_text:
+            text: Firmicutes
+            n: 131
+        - has_n_columns:
+            n: 7
+        - has_n_lines:
+            n: 171