The projects implements a simulation of single cell RNA sequencing (scRNA-seq), accounting for some common sources noise that complicate the analysis of the resulting data.
Create and activate the environment with necessary dependencies with Conda:
conda env create -f environment.yml
conda activate scrnasim
The workflow makes use of the tools available in the scRNAsim-toolz repo. These are:
- Transcript sampler
- Structure generator
- Sequence extractor
- Priming site predictor
- cDNA generator
- Fragment selector
- Read sequencer
Inputs:
- Genome annotation file (gtf) (#1)
- Average gene expression values (csv: geneID,count) (#1)
- Total number of transcripts to samples (#1)
- Probability of intron inclusion (#2)
- Genome sequence file (fasta) (#3)
- Length of poly(A) tails (#3)
- Primer sequence (#4)
- Threshold for the energy of primer-mRNA interaction needed for priming (#4)
- Mean of fragment length (#6)
- SD of fragment length (#6)
- Read length (number of sequencing cycles) (#7)
Outputs:
- Representative transcripts (gtf) (#1)
- Representative transcript counts (csv: transcriptID,count) (#1)
- Sampled transcripts (gtf) (#2)
- Sampled transcript counts (csv: transcriptID,count) (#2)
- Transcript sequences (fasta) (#3)
- Annotated internal priming sites (gtf) (#4)
- Unique cDNA sequences (fasta) (#5)
- cDNA count table (csv) (#5)
- Terminal fragment sequences (fasta) (#6)
- Read sequences (fasta) (#7)