The provided set of bash scripts constitutes a comprehensive RNA-seq analysis workflow, facilitating the processing and analysis of high-throughput sequencing data. Beginning with quality control and trimming using Trimmomatic, the scripts progress to format conversion via gffread, followed by indexing of the genome using HISAT2 or STAR aligners. Alignment of trimmed reads is conducted using HISAT2 or STAR separately. Subsequent steps involve the conversion of SAM to BAM, sorting, and indexing using samtools. The workflow concludes with feature counting using featureCounts, yielding gene-level expression information. These scripts collectively serve as a robust pipeline for researchers conducting RNA-seq experiments, providing a structured and automated approach to handle raw data, align reads to a reference genome, and extract essential quantitative information about gene expression levels. This pipeline is versatile, allowing customization for different organisms and experimental designs, making it a valuable resource for scientists engaged in transcriptome analysis. Let's go through each script and tool used:
-
Purpose: Quality control and trimming of raw sequencing reads.
-
Tools Used:
trimmomatic
: A tool for trimming and filtering of Illumina sequencing data.
-
Steps:
- Set up paths and directories.
- Create output directories for trimmed paired and unpaired reads.
- Iterate over raw sequencing files and perform trimming using Trimmomatic.
- Create a QC directory, run FastQC on the trimmed paired reads, and generate a multiQC report.
-
Purpose: Convert GFF (General Feature Format) to GTF (Gene Transfer Format).
-
Tools Used:
gffread
: A tool for converting GFF3 to GTF.
-
Steps:
- Set up paths and directories.
- Use
gffread
to convert the GFF file to GTF format.
-
Purpose: Indexing the genome for HISAT2 alignment.
-
Tools Used:
hisat2
: A fast and sensitive alignment program for mapping next-generation sequencing reads to a population of genomes.
-
Steps:
- Set up paths and directories.
- Extract exons and splice sites from the GTF file.
- Build the HISAT2 index using extracted exons and splice sites.
-
Purpose: Indexing the genome for STAR alignment.
-
Tools Used:
STAR
: A high-performance RNA-seq aligner.
-
Steps:
- Set up paths and directories.
- Run STAR to generate the genome index.
-
Purpose: Perform alignment of trimmed reads using HISAT2.
-
Tools Used:
hisat2
: For read alignment.
-
Steps:
- Set up paths and directories.
- Iterate over trimmed reads and perform HISAT2 alignment.
-
Purpose: Perform alignment of trimmed reads using STAR.
-
Tools Used:
STAR
: For read alignment.
-
Steps:
- Set up paths and directories.
- Iterate over trimmed reads and perform STAR alignment.
-
Purpose: Convert SAM files to BAM files, sort them, and index the sorted BAM files.
-
Tools Used:
samtools
: Utilities for interacting with and manipulating SAM/BAM/CRAM format files.
-
Steps:
- Set up paths and directories.
- Iterate over SAM files, convert to BAM, sort, and index.
-
Purpose: Perform feature counting on the aligned reads.
-
Tools Used:
featureCounts
: A tool for counting reads in features such as genes, exons, and more.
-
Steps:
- Set up paths and directories.
- Iterate over sorted BAM files and perform feature counting.
- Extract counts and save to a separate file.