This is a Snakemake based 16S QIIME2 pipeline.
To install, we assume you already have installed Miniconda3 (4.7.10+)
(https://docs.conda.io/en/latest/miniconda.html)
- Clone this repository:
git clone https://github.com/PennChopMicrobiomeProgram/16S_QIIME2.git
- Create a conda environment:
cd 16S_QIIME2
conda create --name qiime2-2023.2 --file environment.yml
To run the pipeline, activate the envrionment (currently based on QIIME2 2023.2) by entering
conda activate qiime2-2023.2
- The following software also need to be installed within the environment you created:
To run the pipeline, we need
- Multiplexed R1/R2 read pairs (Undetermined_S0_L001_R1_001.fastq.gz, Undetermined_S0_L001_R2_001.fastq.gz), and
- QIIME2 compatible mapping file
- Tab delimited
- The first two columns should be
SampleID
(or#SampleID
) andBarcodeSequence
Qiime2 classifier
(https://docs.qiime2.org/2023.2/data-resources/)
dada2 training set
(https://benjjneb.github.io/dada2/training.html)
- Create a project directory, e.g.
~/16S_QIIME2/test
and put the mapping file, e.g.test_mapping_file.tsv
in the project directory. If you are running this on the cluster, the data would be staged in a scratch drive e.g./scr1/username
- Edit
qiime2_config.yml
so that it suits your project. In particular,- all: project: path to the project directory, e.g.
~/16S_QIIME2/test
- all: mux_dir: the direcotry containing multiplexed R1/R2 read pairs, e.g.
~/16S_QIIME2/test/multiplexed_fastq
- all: mapping: the name of mapping file, e.g.
test_mapping_file.tsv
- all: project: path to the project directory, e.g.
- Edit
config.yaml
for platform specific settings (currently formatted for SLURM on republica) - (Optional) Edit
rules\targets\targets.rules
to comment out steps you don't need (e.g.#TARGET_PICRUST2
) - To run the pipeline, activate the envrionment by entering
conda activate qiime2-2023.2
,cd
into16S_QIIME2
and executesnakemake --profile ./
- If using
sbatch
you can just execute the script./run_snakemake.bash
- You can also do a dryrun:
./dryrun_snakemake.bash
- If using
- Multiplexed R1/R2 read pairs
- QIIME2 compatible mapping file
- Demultiplexed fastq(.gz) files
- Total read count summary (tsv)
- QIIME2 compatible manifest file (csv)
- QIIME2 compatible manifest file
- Demultiplexed fastq files
- QIIME2 PairedEndSequencesWithQuality artifact and corresponding visualization
- QIIME2-generated demultiplexing stats
- QIIME2 PairedEndSequencesWithQuality artifact
- Feature table (QIIME2 artifact, tsv)
- Representative sequences (QIIME2 artifact, fasta)
- Representative sequences
- Taxonomy classification table (QIIME2 artifact, tsv)
- Representative sequences
- Aligned sequence
- Masked (aligned) sequence
- Unrooted tree
- Rooted tree
- Rooted tree
- Various QIIME2 diversity metric artifacts
- Faith phylogenetic diversity vector (tsv)
- Weighted/unweighted UniFrac distance matrices (tsv)
- Representative sequences (fasta)
- Unassigner output (tsv) for species level classification of representative sequences
- Representative sequences (fasta)
- Dada2 species assignments (tsv)
- Dada2 Raw data for loading in R (RData format)
- Representative sequences (fasta)
- Vsearch report (tsv) customized to be like BLAST results (see config.yml)
- Vsearch list of representative sequences that aligned (fasta)
NB: Currently picrust2-2021.11_0 does not work with qiime2 2023.2 but these would be the outputs if it did:
- Feature table (QIIME2 artifact, tsv)
- Representative sequences (QIIME2 artifact, fasta)
- KEGG orthologs counts (tsv)
- Enzyme classification counts (QIIME2 artifact)
- KEGG pathway counts (QIIME2 artifact)