This repository contains scripts to run analyses and recreate figures from our ataqv manuscript.
Pipelines were run on a Linux server. Pipelines are meant to be used with Snakemake and NextFlow (NextFlow v. 19.04.1). The general organization of the repo is as follows:
src
andbin
directories should be self-explanatory.data
directory contains our raw data and other 'generic' data such as fasta files, bwa indices, etc. (some elements are distributed in the repository itself, some are created/downloaded in the steps outlined below).tn5
directory contains the processing and analyses for the ATAC-seq experiments that kept PCR cycles constant across Tn5 concentrations.cluster_density
directory contains the processing and analyses for the ATAC-seq experiments that did NOT keep PCR cycles constant across Tn5 concentrations.public_survey
directory contains the processing and analyses for the survey of public ATAC-seq data.
These tools must be in your $PATH:
- cta (v. 0.1.2; can be downloaded from the Parker Lab GitHub)
- ataqv (v 1.1.0)
- Install conda environment: conda env create -f environment.yml; source activate ataqv
- Clone the repository
- Set a few environmental variables (of course, you may want to add each of these
export
commands to your.bashrc
):
export ATAQV_HOME='/path/to/repo'
export PICARD_JAR=/path/to/picard.jar
- Fetch the necessary submodules:
git submodule init
git submodule update
cd ${ATAQV_HOME}/src/ATACseq-Snakemake && git checkout 0f49ddf && cd $ATAQV_HOME
- Prepare the
data
directory, containing all generic data (e.g., fasta files, bwa indices, etc). When the GEO repository becomes public, this will also download our ATAC-seq fastq files from GEO (for now, this will obviously not happen):
make data
This will run most commands necessary to set up the data/ directory immediately in a consecutive fashion; however, because bwa indices take several hours to put together, it will submit jobs to drmr for the index creation (job names BWAINDEX). You should wait for those jobs to finish before proceeding.
Change into tn5
and cluster_density
and follow the READMEs there to process the original data.
To process the public data, change into public_survey and follow the README there.