This is an integrative, multi-assay project including individual-matched data generated from human dorsolateral prefrontal cortex (DLPFC) using spatially-resolved transcriptomics with Visium (10x Genomics), single nucleus RNA-seq with Chromium (10x Genomics), bulk RNA-seq, single molecule fluorescent in situ hybridization (smFISH) with RNAScope (Advanced Cell Diagnostics) in combination with immunofluorescence (IF). RNAScope images were processed with HALO (Indica Labs). This dataset can be used to benchmark computational deconvolution algorithms for bulk RNA-seq data that use snRNA-seq reference data.
Experimental design overview and exploration of gene detection in different assays. A. Human postmortem brain dorsolateral prefrontal cortex (DLPFC) tissue blocks across the anterior to posterior axis from 10 donors were dissected for a total of 19 tissue blocks, these tissue blocks are a subset of the 30 tissue blocks that were used in a previous spatial transcriptomic study. For each block, sequential slides were cut for different assays while maintaining the same white matter vs gray matter orientation. B. snRNA-seq data, generated as part of the same spatial transcriptomic study was collected for 19 tissue blocks Huuki-Myers et al., from which bulk RNA-seq data was also generated across two library preparations (polyA in purple or RiboZeroGold in gold) and three different RNA extractions targeting different cell fractions: cytosolic (Cyto, light color), whole cell (Total, intermediate color), or nuclear (Nuc, dark color) in this study. C. tSNE plot of the reference snRNA-seq data at the broad cell type resolution. D. Scatter plot of bulk RNA-seq principal components (PCs) 1 and 2. PC1 is associated with library type and PC2 with RNA extraction method. Colors are the same as groups in B. E. Volcano plots for the differential expression analysis between polyA and RiboZeroGold, faceted by RNA extraction method. The colors of the points are the same as B. Horizontal dotted line denotes FDR < 0.05 cutoff, vertical dotted lines are logFC = -1 and 1. F. Volcano plot for the differential expression analysis between Total bulk RNA-seq (point colors same as E) and snRNA-seq (blue points). Annotations are the same as E.
We hope that this repository will be useful for your research. Please use the following BibTeX information to cite this code repository as well as the data released by this project. Thank you!
Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex.
Louise A. Huuki-Myers, Kelsey D. Montgomery, Sang Ho Kwon, Sophia Cinquemani, Nicholas J. Eagles, Daianna Gonzalez-Padilla, Sean K. Maden, Joel E. Kleinman, Thomas M. Hyde, Stephanie C. Hicks, Kristen R. Maynard, Leonardo Collado-Torres.
bioRxiv 2024.02.09.579665; doi: https://doi.org/10.1101/2024.02.09.579665
@article {Huuki-Myers2024.02.09.579665,
author = {Louise A. Huuki-Myers and Kelsey D. Montgomery and Sang Ho Kwon and Sophia Cinquemani and Nicholas J. Eagles and Daianna Gonzalez-Padilla and Sean K. Maden and Joel E. Kleinman and Thomas M. Hyde and Stephanie C. Hicks and Kristen R. Maynard and Leonardo Collado-Torres},
title = {Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex},
year = {2024},
doi = {10.1101/2024.02.09.579665},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
Files for this project are publicly available.
As documented in the spatialDLPFC
project, the FASTQ files are available via Globus endpoint 'jhpce#DLPFC_snRNAseq' endpoint as well as the PsychENCODE Knowledge Portal through https://doi.org/10.7303/syn51032055.1 or https://www.synapse.org/#!Synapse:syn51032055/datasets/.
The RNA-seq FASTQ files are available via Globus endpoint 'jhpce#humanDeconvolutionBulkRNAseq' endpoint. Bulk RNA-seq FASTQ files are also available at NIH BioProject under accession PRJNA1086804 and Sequence Read Archive study SRP494701.
The RNAscope images are available via the Globus endpoint 'jhpce#humanDeconvolutionRNAScope'.
These images were analyzed with HALO software (Indica labs). The HALO exported setting files and data CSV files are available at raw-data/HALO
. The combined HALO output data is available into an R object is available at processed-data/03_HALO/halo_all.Rdata
.
Check the spatialDLPFC
project for more details on the spatially-resolved transcriptomics data that was generated from these tissue blocks.
Files are organized following the structure from LieberInstitute/template_project. Scripts include the R session information with details about version numbers of the packages we used.
JHPCE location: /dcs04/lieber/lcolladotor/deconvolution_LIBD4030/Human_DLPFC_Deconvolution
.
- snRNA-seq: available in a file called
sce_DLPFC.Rdata
located at the subdirectoryDLPFC_snRNAseq/processed-data/sce/
. - bulk RNA-seq: located at the subdirectory
Human_DLPFC_Deconvolution/processed-data/01_SPEAQeasy/
.
The snRNA-seq data is available as a SingleCellExperiment
object.
Bulk RNA-seq data are available as a SummarizedExperiment
object.
Image data is generated for RNAScope slides by analysis with HALO and outputting the analysis results as .csv
tables. These tables are read from the file tree located at the subdirectory raw-data/HALO/
.
Note that RNAScope experiments were performed with two combinations of markers, called Circle
and Star
, respectively, These experiments are distinct in that they each comprise of an analysis of an independent, albeit adjacent, tissue and each includes a different set of molecular markers (see table below).
The deconvolution project makes use of a number of metadata attributes and variables in the results files mentioned about. This section describes the key terms and definitions of these attributes and variables for the deconvolution method paper.
Cell type labels for snRNA-seq
datasets are determined from the variable cellType_broad_hc
. This can be accessed from the sce
object in various ways such as sce$cellType_broad_hc
or sce[["cellType_broad_hc"]]
. Note that the deconvolution methods paper focuses on just 6 cell types of interest, and these are identified from among the cell type labels in the cellType_broad_hc
variable.
Cell type labels for RNAScope experiments are obtained from the image analysis outputs produced by the HALO software. In brief, outputs each contain a series of columns corresponding to the cell type makers. Since each row in these outputs corresponds to an individual detected nucleus, we simply look at which marker is positive for that nucleus to determine its cell type. Cell type proportions and abundances are then calculated from these outputs.
Cell type labels aren't available for the bulk RNA-seq and other datasets produced for this project.
The molecular markers for the Circle
and Star
RNAscope experiments are as follows:
cellType | marker | Combo | Type | LongName | |
---|---|---|---|---|---|
1 | Endo | CLDN5 | Circle | Ab | Claudin_5 |
2 | Astro | GFAP | Circle | Ab | GFAP |
3 | Inhib | GAD1 | Circle | RNA_probe | GAD1 |
4 | Excit | SLC17A7 | Star | RNA_probe | SLC17A7 |
5 | Micro | TMEM119 | Star | Ab | TMEM119 |
6 | Oligo | OLIG2 | Star | Ab | OLIG2 |
Throughout the deconvolution project we try to use standard terms to refer to key project entities. Here are some of the key terms to be aware of when using these project files and understanding analysis outputs.
We take deconvolution to be the prediction of cell type amounts in a mixed sample by leveraging data from a non-mixed sample. For the deconvolution method paper, we focus on predicting either cell abundances or proportions for each of 6 cell types in bulk RNA-seq data by leveraging a single-nucleus RNA-seq reference dataset. The deconvolution equation looks like:
Where
Strict deconvolution refers to solving for
This simply refers to steps taken to prepare
The
The
The
- NIMH (USA) grant R01 MH123183
- LIBD