forked from galaxyproject/iwc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new ChIP-Seq WF that handles replicates and controls
- Loading branch information
Showing
7 changed files
with
3,858 additions
and
0 deletions.
There are no files selected for viewing
11 changes: 11 additions & 0 deletions
11
workflows/epigenetics/chipseq-pe-with-replicates-controls/.dockstore.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
version: 1.2 | ||
workflows: | ||
- name: main | ||
subclass: Galaxy | ||
publish: true | ||
primaryDescriptorPath: /chipseq-pe-with-replicates-controls.ga | ||
testParameterFiles: | ||
- /chipseq-pe-with-replicates-controls-tests.yml | ||
authors: | ||
- name: Wolfgang Maier | ||
orcid: 0000-0002-9464-6640 |
5 changes: 5 additions & 0 deletions
5
workflows/epigenetics/chipseq-pe-with-replicates-controls/.workflowhub.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
version: '0.1' | ||
registries: | ||
- url: https://workflowhub.eu | ||
project: iwc | ||
workflow: chipseq-pe-with-replicates-controls/main |
5 changes: 5 additions & 0 deletions
5
workflows/epigenetics/chipseq-pe-with-replicates-controls/CHANGELOG.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Changelog | ||
|
||
## [0.1] 2024-10-22 | ||
|
||
Initial release |
55 changes: 55 additions & 0 deletions
55
workflows/epigenetics/chipseq-pe-with-replicates-controls/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Quality control, mapping and peaks identification for ChIP-Seq replicates with controls | ||
|
||
This workflow is for analyzing batches of ChIP-Seq samples with controls and replicates from paired-end sequenced reads to called peaks. | ||
|
||
It uses: | ||
- fastp for sequenced reads pre-processing, | ||
- bowtie2 for mapping | ||
- MACS2 for peak calling | ||
- deeptools for cross-sample correlation and averaging | ||
|
||
The workflow provides quality control at the level of sequenced reads, mapping results and called peaks and visualizes correlation between samples. | ||
|
||
## Input datasets | ||
|
||
- Sequencing data: this must be provided as a single list collection of paired fastq datasets of all samples. | ||
- Sample sheet: this is expected to be a 4-column tabular dataset that describes samples, their association with each other and with conditions and replicates. | ||
|
||
The first column of the file must list all samples with their names matching the element names in the Sequencing data collection. Samples can be listed in any order. | ||
The second column is used to specify the specific experimental condition that each sample represents. There is no formal restriction on this column, but values should be kept short for readable reports. | ||
The third column is used to specify the replicate that each sample belongs to. There is no formal restriction to replicate identifiers, but they should be kept as short as possible. At least two replicates are required per condition, but different conditions can have different numbers of replicates. | ||
The fourth column must provide the name of the sample that serves as the control for the sample described on each line. Different samples can be associated with the same control sample. | ||
Control samples must also be listed on their own lines just like regular samples, but must use . or - as the value of the fourth column. The value of the third column (replicate ID) is ignored for control sample lines so may also be set to . or -. | ||
|
||
Here's an example sample sheet: | ||
|
||
SRR5680995 input - - | ||
SRR5680996 H3K4me3 rep1 SRR5680995 | ||
SRR5680997 H3K27me3 rep1 SRR5680995 | ||
SRR5681007 H3K27me3 rep2 SRR5681005 | ||
SRR5681006 H3K4me3 rep2 SRR5681005 | ||
SRR5680998 CTCF rep1 SRR5680995 | ||
SRR5681008 CTCF rep2 SRR5681005 | ||
SRR5681005 input - - | ||
|
||
This declares an experimental design with three conditions - H3K4me3, H3K27me3 and CTCF - with two replicates per condition and one input control per replicate. The control sample SRR5680995 is declared as the shared control for all samples from replicate rep1, SRR5681005 as the control for all samples from replicate rep2. | ||
|
||
## Input parameters | ||
|
||
- Reference genome: set this to the reference genome of your organism of interest; used at the read mapping step | ||
- Sequencing adapter - forward (optional) | ||
- Sequencing aadapter - reverse (optional) | ||
- Effective genome size: this is used by MACS2 and may be entered manually (indications are provided for heavily used genomes). | ||
- Average size of sequenced fragments: used for deeptools-base QC | ||
|
||
## Outputs: | ||
|
||
- MultiQC analysis reports: | ||
- Sample fingerprints: | ||
- Between-samples correlation plot: | ||
- Clustered heatmap of peaks across samples: | ||
- Peak regions called by MACS2: | ||
- Positions of summits of MACS2-called peaks: | ||
- Peaks per replicate: | ||
- Peaks averaged across replicates: | ||
|
86 changes: 86 additions & 0 deletions
86
...enetics/chipseq-pe-with-replicates-controls/chipseq-pe-with-replicates-controls-tests.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
- doc: Test outline for ChIPseq-PE-with-replicates-controls workflow | ||
job: | ||
Sample sheet: | ||
class: File | ||
path: test-data/test_sample_sheet.tsv | ||
filetype: tabular | ||
Sequencing data: | ||
class: Collection | ||
collection_type: list:paired | ||
elements: | ||
- class: Collection | ||
type: paired | ||
identifier: SRR5204807 | ||
elements: | ||
- identifier: forward | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204807_Spt5-ChIP_IP1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz | ||
filetype: fastqsanger.gz | ||
- identifier: reverse | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204807_Spt5-ChIP_IP1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz | ||
filetype: fastqsanger.gz | ||
- class: Collection | ||
type: paired | ||
identifier: SRR5204808 | ||
elements: | ||
- identifier: forward | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204808_Spt5-ChIP_IP2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz | ||
filetype: fastqsanger.gz | ||
- identifier: reverse | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204808_Spt5-ChIP_IP2_SacCer_ChIP-Seq_ss100k_R2.fastq.gz | ||
filetype: fastqsanger.gz | ||
- class: Collection | ||
type: paired | ||
identifier: SRR5204809 | ||
elements: | ||
- identifier: forward | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz | ||
filetype: fastqsanger.gz | ||
- identifier: reverse | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz | ||
filetype: fastqsanger.gz | ||
- class: Collection | ||
type: paired | ||
identifier: SRR5204810 | ||
elements: | ||
- identifier: forward | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz | ||
filetype: fastqsanger.gz | ||
- identifier: reverse | ||
class: File | ||
location: https://github.com/nf-core/test-datasets/raw/refs/heads/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R2.fastq.gz | ||
filetype: fastqsanger.gz | ||
Reference genome: "sacCer3" | ||
Effective genome size: 12000000 | ||
Average size of sequenced fragments: 200 | ||
outputs: | ||
multiqc_stats: | ||
asserts: | ||
has_n_lines: | ||
n: 5 | ||
has_text_matching: | ||
expression: "SRR5204807_Spt5_rep1\t163.0\t0.0\t0.0\t844.+" | ||
macs2_report: | ||
element_tests: | ||
wt_H3K4me3: | ||
asserts: | ||
- that: "has_text" | ||
text: "# name = SRR5204807_Spt5_rep1" | ||
- that: "has_text" | ||
text: "# fragment size is determined as 163 bps" | ||
- that: "has_text" | ||
text: "# fragments after filtering in treatment: 86394" | ||
mapping_stats: | ||
element_tests: | ||
SRR5204807_Spt5_rep1: | ||
asserts: | ||
- that: "has_text" | ||
text: "3067 (3.14%) aligned concordantly 0 times" | ||
- that: "has_text" | ||
text: "80795 (82.60%) aligned concordantly exactly 1 time" |
Oops, something went wrong.