chipseq-smk-pipeline

Snakemake based pipeline for ChIP-seq and ATAC-seq datasets processing from raw data QC and alignment to visualization and peak calling.

During peak calling steps chipseq-smk-pipeline automatically matches signal with control file by names proximity.

Input

Input FASTQ files

Pipeline aligned FASTQ or gzipped FASTQ reads, defined in config.yaml.
Reads folder is a relative path in pipeline working directory and defined by fastq_dir property.
FASTQ reads extension is defined by fastq_ext property, e.g. could be fq, fq.gz, fastq, fastq.gz.

Input BAM files

Use start_with_bams=True config option to start with existing bam files.
Pipeline starts with BAM files in work_dir/bams folder.

Files

Path	Description
`config.yaml`	Default pipeline options
`trimmed`	Trimmed FASTQ file, if `trim_reads` option is True.
`bams`	BAMs with aligned reads, `MAPQ >= 30`
`bw`	BAM coverage visualization using DeepTools
`macs2`	MACS2 peaks
`sicer`	SICER peaks
`span`	SPAN peaks
`qc`	QC Reports
`multiqc`	MultiQC reports for different steps
`logs`	Shell commands logs

Requirements

The pipeline requires conda.

If conda is not installed, follow the instructions at Conda website.
Navigate to repository directory.

Create a Conda environment for snakemake:

$ conda env create --file environment.yaml --name snakemake

Activate the newly created environment:

$ source activate snakemake

On Ubuntu please ensure that gawk is installed:

$ sudo apt-get install gawk

Launch

Run the pipeline to start with fastq reads:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> --rerun-incomplete

The Default pipeline doesn't perform coverage visualization and launch peak callers. Please add bw=True, macs2=True, sicer=True, span=True to create coverage bw files and call peaks.

To launch MACS2 in --broad mode, use the following config:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
    --rerun-incomplete

See config.yaml for a complete list of parameters. Use--config to override default options from config.yaml file.

Rules

Rules DAG produced with additional command line agruments --forceall --rulegraph | dot -Tpdf > rules.pdf

Computational cluster QSUB/LFS/QSUB

Configure profile for required cluster system with name cluster.

$ mkdir -p ~/.config/snakemake
$ cd ~/.config/snakemake
$ cookiecutter https://github.com/iromeo/generic.git

Example of ATAC-Seq processing on qsub

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --directory <work_dir> \
    --profile cluster --cluster-config cluster_config.yaml --jobs 150 \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    bowtie2_params="-X 2000 --dovetail" \
    macs2=True macs2_params="-q 0.05 -f BAMPE --nomodel --nolambda -B --call-summits" \
    span=True span_fragment=0 span_bg_sensitivity=1.0 span_clip=0.4 --rerun-incomplete

P.S: Use --config to override default options from config.yaml file

Try with test data

Please download example fastq.gz files from CD14_chr15_fastq folder.
These files are filtered on human hg19 chr15 to reduce size and make computations faster.

Launch chipseq-smk-pipeline:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --cores all --directory <work_dir> \
    --config fastq_ext=fastq.gz fastq_dir=<work_dir> genome=hg19 macs2=True sicer=True span=True \
    --rerun-incomplete

Useful links

Learn more about Snakemake workflow management system
Developed with SnakeCharm plugin for PyCharm IDE by JetBrains Research BioLabs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

chipseq-smk-pipeline

Input

Files

Requirements

Launch

Rules

Computational cluster QSUB/LFS/QSUB

Try with test data

Useful links

Files

README.md

Latest commit

History

README.md

File metadata and controls

chipseq-smk-pipeline

Input

Files

Requirements

Launch

Rules

Computational cluster QSUB/LFS/QSUB

Try with test data

Useful links