This is a pipeline for the quantification of Transposable Elements in the single-cell STORM-seq samples. It can also be used with other technologies like SMART-seq, SMART-seq2 etc.
- featureCounts
- data.table
- dplyr
- stringr
- optparse
- ggplot2
- scales
- ggh4x
- Scuttle
- Aligned bam files are used as inputs to featureCounts with parameters
-F SAF -O -B -p --fracOverlap 0.1 -M -s 0 --fraction
specified in the config.yaml file. These can also be altered as per the requirement. - feature counts files for all the cells are then used to generate a combined raw count, counts per million, raw count for only intergenic and intronic TEs and counts per millions for only intergenic and intronic TEs matrices.
- If filtering requirement is set to be True in the
config.yaml
file then a quick Scuttle filtering is performed to filter out low quality cells and generate filtered count matrices. - Counts per millions for only intergenic and intronic TEs matrix is used for the log enrichment calculation of TE's
- Enrichment score is calculated as per the folloing formula.
- If the log enrichment heatmap plot requirement is set to be True in the
config.yaml
file then a heatmap plot, generated using ggplot2, is also saved as a pdf file.
- Clone the repo (https://github.com/AyushSemwal/TE_quantification_snakemake) using:
- HTTPS:
https://github.com/AyushSemwal/TE_quantification_snakemake.git
- SSH:
[email protected]:AyushSemwal/TE_quantification_snakemake.git
- HTTPS:
- Unzip
hg38_pc_te_chrM.saf.tar.gz
andintergenic_intronic_tes.txt.tar.gz
in the config folder. - Modify the
config.yaml
file in the config folder to specifyaligned_bam_dir
(aligned bam files directory),output_dir
(directory where all output files and sub directories will be stored) and other parameters as per the requirements. Even though I have assigned intuitive names to the parameters, I have also added comments in front of them. - Populate the
samples.tsv
file such that first column is contains cell names (or sample names in case of bulk samples) and second columns contains the bam file names. Do not add a header to this file. - If slurm is available then submit the job by running
sbatch bin/workflow_sbatch.sh
from the parent directory else you can runsnakemake --use-conda --cores {num_cores}
.