Skip to content

Single cell RNA Sequencing Analysis

Michael Kotliar edited this page Jul 6, 2022 · 53 revisions

Used for filtering, normalization, scaling, integration (optionally), and clustering of single or aggregated single-cell RNA-Seq datasets

The main functional blocks of sc-rna-analyze-wf.cwl workflow are shown below. For a detailed workflow structure refer to CWL Viewer.

scRNA-Seq scheme


In this example we will run the analysis of Single-cell RNA sequencing data described in the Surumbayeva, Kotliar et al. (2021) paper. First, make sure you have cwltool, Docker, git and wget tools installed, then proceed to the steps below.

With the minimum required Docker configuration (4 CPU and 20GB of RAM) the approximate running time is up to 1 h.

  1. Create a temporary folder and clone the current repository.
    mkdir sc_rna
    cd sc_rna
    git clone https://github.com/Barski-lab/sc-seq-analysis.git
  2. Create a folder for input data. Download required input files from the Figshare either using a web browser or commands below.
    mkdir inputs
    cd inputs
    wget -O filtered_feature_bc_matrix.tar.gz https://figshare.com/ndownloader/files/34819513
    wget -O aggregation.csv https://figshare.com/ndownloader/files/34819516
    wget -O condition.csv https://figshare.com/ndownloader/files/34819519
    wget -O mouse_cell_cycle_genes.csv https://figshare.com/ndownloader/files/34822054
  3. Copy the job definition file into the inputs folder.
    cp ../sc-seq-analysis/jobs/sc-rna-analyze-wf.yaml .
  4. Create a folder for workflow outputs and execute sc-rna-analyze-wf.cwl workflow with sc-rna-analyze-wf.yaml job definition file.
    cd ..
    mkdir outputs
    cd outputs
    cwltool ../sc-seq-analysis/workflows/sc-rna-analyze-wf.cwl ../inputs/sc-rna-analyze-wf.yaml

Expected outputs (some of the plots and files are omitted)

Note, as we constantly improve our tools and update Dockerfile frequently, your outputs can be slightly different from the plots below. In order to reproduce exactly the same results, switch to 4819746 commit.

Clustering results can be also evaluated interactively in UCSC Cell Browser using RangeHTTPServer or any other simple HTTP server.

cd html_data
python3 -m RangeHTTPServer   # open http://localhost:8000/

Example of UCSC Cell Browser window.

Step 1. QC metrics and the results of low-quality cells removal.

Before low-quality cells removal After low-quality cells removal

Step 2. Dimensionality reduction and evaluating confounding sources of variation.

Step 3. Cluster analysis and gene markers identification.

Example of the table with identified gene markers (top 10 rows)

resolution cluster feature p_val avg_log2FC pct.1 pct.2 p_val_adj
0.5 0 Dpt 0 3.06828808 0.941 0.176 0
0.5 0 Col3a1 0 2.57320598 0.998 0.814 0
0.5 0 Lum 0 2.48541392 0.951 0.318 0
0.5 0 C4b 0 2.44687844 0.962 0.444 0
0.5 0 Fbn1 0 2.42990848 0.978 0.524 0
0.5 0 Clec3b 0 2.4144368 0.767 0.134 0
0.5 0 Col14a1 0 2.39398771 0.897 0.167 0
0.5 0 Sfrp1 0 2.37871487 0.852 0.304 0
0.5 0 Gsn 0 2.37083093 0.946 0.602 0
Clone this wiki locally