The H3ABioNet 16S rRNA Microbiome Intermediate Bioinformatics course will provide training to enable participants to gain the knowledge and skills to perform 16S rRNA microbiome data analyses using a variety of bioinformatics methods and tools.
Anyone who will be working with 16S rRNA microbiome data and would like to learn more about the bioinformatics analyses.
- Module 1: Introduction to the Linux command line / intro to R
- Module 2: Introduction to the microbiome and study design – why 16S?
- Module 3: Sample collection, extraction and library prep for 16S NGS analyses
- Module 4: 16S rRNA gene amplicon sequencing bioinformatics pipeline: the theory
- Module 5: 16S analysis pipeline - QC, ASV picking, taxonomic classification and alignment
- Module 6: Downstream analysis in R - using the packages phyloseq, NMF, vegan, metagenomeSeq (among others)
- Potential bonus module: Shotgun sequencing
By the end of the course participants should be able to:
- Describe the importance of the microbiome and why it should be studied
- Understand how to design 16S rRNA microbiome studies
- Be able to apply basic syntax and operations in R
- Understand the different NGS data types (e.g. MiSeq reads) produced for a 16S rRNA microbiome study
- Evaluate the quality of NGS sequence reads and samples
- Understand the various bioinformatics tools used for 16S microbiome studies
- Understand the various 16S rRNA bioinformatics pipelines being used to study the microbiome.
- Apply the H3ABioNet’s 16S rRNA pipeline and understand how to execute this
- Understand the use of workflow languages (Nextflow) and containerized images (Singularity) to automate analyses
- Analyze 16S rRNA microbiome data and interpret results
This repository contains the trainer's presentations, tutorial scripts as well as instructions on how to set up Nextflow for automation of 16S rRNA pipeline. The trainers' presentation videos are available on H3ABioNet's Youtube channel on this playlist.
This resource assumes that you are running Linux with R and RStudio installed. Additionally, the latest version of the following R packages should be installed:
- dada2
- DECIPHER
- phangorn
- metagenomeSeq
- vegan
- ggplot2
- NMF
- gridExtra
- dplyr
- phyloseq
This training offers a step-by-step dada2 tutorial on how to analyze 16S rRNA microbiome data (covered in Module 5 and 6) as well as automation of the analysis using Nextflow workflow language and containerized images (Module 6 Session 1), before downstream statistical analysis in R.
To run the step-by-step tutorial, you require R and RStudio and the R packages listed above installed. The dada2 tutorial starts in Module 5 and continues in Module 6 on downstream analysis. The scripts and required data for the tutorials are also linked in the section below.
To run dada2 nextflow pipeline on a computer cluster, follow the instructions outlined on this document for software set up and running the pipeline. It involves setting up Singularity, R and RStudio, and Nextflow. For the training at ICIPE, please follow the instructions outlined here.
Module 1: Introduction to Linux and R
- Module resource
- Rstudio website
- RStudio Course Material
- Debugging code in RStudio
- Introduction to R
- Assignment operators in R
- code styling
- R Data types and data structures
- Good practices in scientific computing
Module4: 16S rRNA sequencing bioinformatics pipeline theory
- FASTQC analysis
- H3ABioNet 16S rRNA SOP
- Mothur
- QIIME2
- Qiime2 tutorial
- DADA2 Pipeline tutorial
- UPARSE
- IM Tornado
- FROGS
- VSEARCH
Module 5: 16S analysis pipeline
- dada2 tutorial site
- dada2 tutorial R script
- Required tutorial data: http://web.cbio.uct.ac.za/~gerrit/downloads/dog_stool_full.tgz
- RefSeq-RDP 16S database: https://zenodo.org/record/3266798/files/RefSeq-RDP16S_v3_May2018.fa.gz
Module 6: Downstream analysis in R