Skip to content

Commit

Permalink
Merge pull request galaxyproject#14 from bernt-matthias/topic/dada2
Browse files Browse the repository at this point in the history
dada2: add dada2 workflow for paired end data
  • Loading branch information
mvdbeek authored Mar 11, 2024
2 parents 242a145 + df10151 commit 3c15860
Show file tree
Hide file tree
Showing 7 changed files with 1,140 additions and 0 deletions.
12 changes: 12 additions & 0 deletions workflows/amplicon/dada2/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /dada2_paired.ga
testParameterFiles:
- /dada2_paired-tests.yml
authors:
- name: Matthias Bernt
orcid: 0000-0003-3763-0797
- name: UFZ Leipzig
4 changes: 4 additions & 0 deletions workflows/amplicon/dada2/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Changelog

## [0.1] 2024-01-09
First release.
37 changes: 37 additions & 0 deletions workflows/amplicon/dada2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Dada2: amplicon analysis for paired end data

## Inputs dataset

- `Paired input data` paired input collection in FASTQ format

## Inputs values

- `Read length forward/reverse reads` length of the forward/reverse reads to which they should be truncated in the filter and trim step
- `Pool samples` pooling may increase sensitivity
- `Reference database` that should be used for taxonomic assignment

## Processing

The workflow follows the steps described in the [dada2 tutorial](https://benjjneb.github.io/dada2/tutorial.html).

As a first step the input collection is sorted. This is important because the dada2 step outputs
a collection in sorted order. If the input collection would not be sorted then the mergePairs step
samples would be mixed up.

- `FilterAndTrim` Quality control by filtering and trimming reads
- `QualityProfile` is called before and after the FilterAndTrim step
- `Unzip Collection` separates forward and reverse reads (the next steps are evaluated separately on forward and reverse reads)
- `learnErrors` learn error rates
- `dada` filter noisy reads
- `mergePairs` merge forward and reverse reads
- `makeSequenceTable` create the sequence table
- `removeBimeraDenovo` remove chimeric sequencs
- `assignTaxonomy` assign taxonomic information from a reference data base

## TODO

Some possibilities to extend/improve the workflow

- output BIOM
- use ASV1, ... in sequence table and taxonomy output, and output additional fasta
- allow to use custom taxonomy / make it optional
86 changes: 86 additions & 0 deletions workflows/amplicon/dada2/dada2_paired-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
- doc: Test outline for dada---paired
job:
Paired input data:
class: Collection
collection_type: list:paired
elements:
- class: Collection
type: paired
identifier: F3D0
elements:
- class: File
identifier: forward
location: https://zenodo.org/record/800651/files/F3D0_R1.fastq
- class: File
identifier: reverse
location: https://zenodo.org/record/800651/files/F3D0_R2.fastq
- class: Collection
type: paired
identifier: F3D5
elements:
- class: File
identifier: forward
location: https://zenodo.org/record/800651/files/F3D5_R1.fastq
- class: File
identifier: reverse
location: https://zenodo.org/record/800651/files/F3D5_R2.fastq
- class: Collection
type: paired
identifier: F3D145
elements:
- class: File
identifier: forward
location: https://zenodo.org/record/800651/files/F3D145_R1.fastq
- class: File
identifier: reverse
location: https://zenodo.org/record/800651/files/F3D145_R2.fastq
- class: Collection
type: paired
identifier: F3D150
elements:
- class: File
identifier: forward
location: https://zenodo.org/record/800651/files/F3D150_R1.fastq
- class: File
identifier: reverse
location: https://zenodo.org/record/800651/files/F3D150_R2.fastq
- class: Collection
type: paired
identifier: Mock
elements:
- class: File
identifier: forward
location: https://zenodo.org/record/800651/files/Mock_R1.fastq
- class: File
identifier: reverse
location: https://zenodo.org/record/800651/files/Mock_R2.fastq
Read length forward read: 240
Read length reverse read: 160
Pool samples: 'FALSE'
Cached reference database: silva_132
outputs:
Sequence Table:
path: test-data/Sequence Table.dada2_sequencetable
asserts:
- has_n_columns:
n: 6
- has_n_lines:
n: 171
Counts:
path: test-data/Counts.tabular
asserts:
- has_n_columns:
n: 8
- has_n_lines:
n: 6
Taxonomy:
ftype: tabular
sorted: true
asserts:
- has_text:
text: Firmicutes
n: 131
- has_n_columns:
n: 7
- has_n_lines:
n: 171
Loading

0 comments on commit 3c15860

Please sign in to comment.