Cut and Tag Pipeline

Background

This Cut and Tag Pipline based on

CUT&Tag Data Processing and Analysis Tutorial from Ye Zheng el. _al.

Prerequisites

conda is used to mange software used in this pipeline. For further information, please consult https://docs.anaconda.com/anaconda/install/linux-aarch64/.

Included in this repository is the conda_env_no_build.yaml which contains all packages required to install this pipeline.

Installation

Clone the repo
```
git clone 
```
Activate conda environment with RNA_Seq.yml file
```
conda env activate -f conda_env_no_build.yaml
```

You are now ready to run the pipeline!

Nextflow

Under development

Bash

Example bash script can be found in example folder

Core

use script/main/run_cut_tag.sh to run cut and tag pipeline.

Input

The script takes the following arguments:

raw_fastq1
raw_fastq2
out_dir
sample_name
script_dir
skip_trimmed
no_rose2
no_spikein
peak_caller ( seacr or macs2 )
ctrl_bedgraph
ctrl_bam

bash script/main/run_cut_tag.sh $fastq_1 $fastq_2 ${out_dir}/$sample_name $sample_name $script_dir false true $ctrl_bam $ctrl_bedgraph

Post-processing

Run QC

bash script/post_processing/run_QC.sh script/post_processing/QC $out_dir

Organize all outputs

Input

out_dir: directory where the pipeline output directory is
final_result: path where organized data folder should be

bash $script_folder/symlink_final_result.sh $out_dir $final_result

Output

Single sample output

Sample_folder
├── alignment
│   ├── bam
│   └── bed
├── fastqc
├── peak_calling
├── QC_summary
└──trimmed_reads

Organized data output

final_results
├── bam
├── cpm
├── fastqc
├── frag_legnth
├── Sample_run_frip_score_summary.txt
├── Sample_run_QC_summary.csv
├── peak
└── QC

Note: all reference files should be downloaded to ref folder

NEB Ultra II Kit is obtained from https://www.neb.com/faqs/2021/01/15/what-sequences-need-to-be-trimmed-for-nebnext-libraries-that-are-sequenced-on-an-illumina-instrument (If you use other library prep kit other than NEB, you need to obtain adapter sequence from the vendor)
Bowtie2 genome ref should be downloaded into ref folder. We use the no alternative set. https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#indexing-a-reference-genome
E. coli genome ref should be downloaded into ref folder if you use E. coli spike-in

Pooled Peak calling

Inspired by ChIP-seq ENCODE 3 pipeline, replicated BAM files are combined for peak calling using MACS2. IDR is then applied to assess the reproducibility of the pooled peak sets.

Run MACS2

The script takes the following arguments:

out_sample : output sample name with path (e.g. out_folder/Sample1_pooled)
control_bam
sample_repX_bam : bam files for reach replicates

bash script/pooled_sample_processing/pooled_run_macs2.sh $out_sample $control_bam $sample_rep1_bam $sample_repX_bam

Run IDR

The script accept only two replicates peak sets and run by using the following argument.

rep1
rep2
pool_peak
Idr_output_folder

bash script/pooled_sample_processing/run_idr.sh $rep1 $rep2 $pooled_peak $idr_output

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
debug		debug
example		example
ref		ref
script		script
.gitignore		.gitignore
README.MD		README.MD
conda_env_no_build.yml		conda_env_no_build.yml
cut_tag_v1.wdl		cut_tag_v1.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cut and Tag Pipeline

Background

Prerequisites

Installation

Nextflow

Bash

Core

Input

Post-processing

Run QC

Organize all outputs

Input

Output

Pooled Peak calling

Run MACS2

Run IDR

About

Releases

Packages

Languages

sfpacman/cut_tag_pipeline_public

Folders and files

Latest commit

History

Repository files navigation

Cut and Tag Pipeline

Background

Prerequisites

Installation

Nextflow

Bash

Core

Input

Post-processing

Run QC

Organize all outputs

Input

Output

Pooled Peak calling

Run MACS2

Run IDR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages