Skip to content

Single cell RNA Sequencing Analysis

Michael Kotliar edited this page Apr 18, 2022 · 53 revisions

Used for filtering, normalization, scaling, integration (optionally), and clustering of single or aggregated single-cell RNA-Seq datasets

The main functional blocks of sc-rna-analyze-wf.cwl workflow are shown below. For detailed workflow structure refer to CWL Viewer.

scRNA-Seq scheme


To reproduce the analysis of single-cell RNA sequencing data described in the Surumbayeva, Kotliar et al., 2021 paper make sure you have cwltool, Docker, Git and wget tools installed. Then proceed to the following steps.

  1. Create a temporary folder and clone the current repository.
    mkdir sc_rna
    cd sc_rna
    git clone https://github.com/Barski-lab/sc-seq-analysis.git
    
  2. Create a folder for input data. Download required input files from the Figshare either using a web browser or commands below.
    mkdir inputs
    cd inputs
    wget -O filtered_feature_bc_matrix.tar.gz https://figshare.com/ndownloader/files/34819513
    wget -O aggregation.csv https://figshare.com/ndownloader/files/34819516
    wget -O condition.csv https://figshare.com/ndownloader/files/34819519
    wget -O mouse_cell_cycle_genes.csv https://figshare.com/ndownloader/files/34822054
    
  3. Copy the job definition file into the inputs folder.
    cp ../sc-seq-analysis/jobs/sc-rna-analyze-wf.yaml .
    
  4. Create a folder for workflow outputs and execute sc-rna-analyze-wf.cwl workflow with sc-rna-analyze-wf.yaml job definition file.
    cd ..
    mkdir outputs
    cd outputs
    cwltool ../sc-seq-analysis/workflows/sc-rna-analyze-wf.cwl ../inputs/sc-rna-analyze-wf.yaml
    

Step 1. Filters single-cell RNA-Seq datasets based on the common QC metrics.

Raw Filtered
Number of cells per dataset Number of cells per dataset
UMI per cell density UMI per cell density
Split by grouping condition UMI per cell density Split by grouping condition UMI per cell density
Genes per cell density Genes per cell density
Split by grouping condition genes per cell density Split by grouping condition genes per cell density
Genes vs UMI per cell correlation Genes vs UMI per cell correlation
Percentage of transcripts mapped to mitochondrial genes per cell density Percentage of transcripts mapped to mitochondrial genes per cell density
Novelty score per cell density Novelty score per cell density
Split by grouping condition the novelty score per cell density Split by grouping condition the novelty score per cell density
QC metrics per cell density QC metrics per cell density

Step 2. Integrates multiple single-cell RNA-Seq datasets, reduces dimensionality using PCA.

Elbow plot (from cells PCA) Correlation plots between QC metrics and cells PCA components QC metrics on cells UMAP Split by the genes per cell counts cells UMAP Grouped by condition split by the genes per cell counts cells UMAP Split by the UMI per cell counts cells UMAP Grouped by condition split by the UMI per cell counts cells UMAP Split by the percentage of transcripts mapped to mitochondrial genes cells UMAP Grouped by condition split by the percentage of transcripts mapped to mitochondrial genes cells UMAP

Step 3. Clusters single-cell RNA-Seq datasets, identifies gene markers.

Clustered cells UMAP Silhouette scores. Downsampled to max 500 cells per cluster
Grouped by cluster split by dataset cells composition plot. Downsampled Grouped by dataset split by cluster cells composition plot. Downsampled

Split by grouping condition clustered cells UMAP

Grouped by cluster split by condition cells composition plot. Downsampled Grouped by condition split by cluster cells composition plot. Downsampled

Split by cell cycle phase clustered cells UMAP

Grouped by cell cycle phase split by dataset cells composition plot. Downsampled Grouped by cell cycle phase split by cluster cells composition plot. Downsampled

Log normalized scaled average gene expression per cluster

Clone this wiki locally