BaSH_seq Pipeline (Bash and Slurm on HPC)

Developed and maintained by Matthew Galbraith

This pipeline processes high-throughput sequencing data (PE/SE) through QC,trim/filter,alignment,counting etc via sequential stages, with individual samples run in parallel via submission to a Slurm queue.
This specific version was customized to run on HDC Eureka instances (running CentOS 7) on Google Compute Engine and uses a custom Conda environment to supply most of the tools/dependencies required by each stage, with only a few problemmatic tools running from Singularity containers.

BaSH_seq/
├── CutRun
│   └── stageScripts  # Cut&Run-specific stage scripts + symbolic links to common stage scripts
├── RNAseq
│   └── stageScripts  # RNAseq-specific stage scripts + symbolic links to common stage scripts
├── misc
├── multiqc
├── other_scripts
├── pipeline_templates
└── stageScripts                # common stage scripts live here

Dependencies:

Bash Slurm Tools required by each stage (in PATH / Conda env / Containers); Conda / Docker/Singularity References and related files MultiQC script (need to update and integrate) R (if integrating RPKM script)

Installation and setup

need to download required refs etc need to modify paths to stage scripts, references, etc fastq_screen.conf For Conda version: need to add env.yaml For container version: modify stage scripts to use correct call to each tool or container

Pre-run steps?

FASTQ merging?

Steps to run pipeline:

In Project/ : create top-level working dirs: Project/raw_date and Project/analysis_date
In Project/analysis_date/ create sample_locations.txt (field 1 = SAMPLE_NAME; field 2 = path/to/raw_fastq_file.gz; field 3 = read1 / read2 labels)
See also: SampleInfo.xlsx template
Edit variables in top section (SEE XXX FOR MORE DETAILS eg Human vs Mouse & strandedness) of pipeline script template and save a project-specific version in Project/analysis_date/scripts(specify path to this version as indicated below)
From Project/analysis_date/ run analysis_setup.sh as follows:
sh path/to/analysis_setup.sh <FULL/PATH/TO/PIPELINE_SCRIPT> <START_AT_STAGE> <END_AT_STAGE>
This will create and populate Project/analysis_date/Sample_* directories with symbolic links to FASTQ files in Project/raw_date as well as write out 'submit' scripts to Project/analysis_date/scripts
Usually will specify subsets of pipeline stages to allow for QC checks/troubleshooting eg stages 1-3, 4-4, 4-11
Then from Project/analysis_date/scripts run submit scripts as follows (command can be copied from within submit script):
sbatch submitAll_START_AT_STAGE-END_AT_STAGE.sh
If using conda to magae required tools, will need to first activate appropriate env
Monitor progress using squeue (see watch alias)

Post-run steps and troubleshooting

check pipeline logs
(if needed) check Sample/Stage logs
Run MultiQC and inspect report
(RNAseq only) Gather counts files
Run temp FASTQ and BAM file cleanup
(if needed) sync back to main data store

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
CutRun/stageScripts		CutRun/stageScripts
RNAseq/stageScripts		RNAseq/stageScripts
misc		misc
multiqc		multiqc
other_scripts		other_scripts
pipeline_templates		pipeline_templates
stageScripts		stageScripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
analysis_setup_v0.9.sh		analysis_setup_v0.9.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BaSH_seq Pipeline (Bash and Slurm on HPC)

Dependencies:

Installation and setup

Pre-run steps?

Steps to run pipeline:

Post-run steps and troubleshooting

About

Releases 1

Packages

Languages

License

mattgalbraith/BaSH_seq

Folders and files

Latest commit

History

Repository files navigation

BaSH_seq Pipeline (Bash and Slurm on HPC)

Dependencies:

Installation and setup

Pre-run steps?

Steps to run pipeline:

Post-run steps and troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages