Gregory Way and Casey Greene University of Pennsylvania
The amount and type of immune cell infiltration into tumors is an important determinant of disease progression and survival. Cancer subtypes can often be distinguished by the rate of immune cell infiltration since this tends to be one of the more dominant observable signatures in gene expression data. Current methods for directly observing immune cell profiles in a given population include laboriously quantifying immune cell proportions by flow cytometry or other technically challenging cell labeling techniques. Therefore, deconvolution methods are being developed to automatically extract immune cell proportions from full tumor gene expression data.
We used ssGSEA (Barbie et al. 2009) to deconvolute immune cell signatures from glioblastoma multiforme (GBM) tumors from The Cancer Genome Atlas. Briefly, ssGSEA is a simple rank based test that evaluates the empirical cumulative distribution function of input gene sets compared to the eCDF of the remaining genes.
We used LM22.txt as defined by Newman et al. 2015 as input genelists to ssGSEA.
Our end to end analysis from downloading data to generating publication ready figures is provided in this github repository. We implement an automatic reproducible workflow using continuous analysis to ensure a stable compute environment and consistent reproducibility.
We use the ssGSEA implementation available on bioconductor (Guinney and Castelo 2016).
# To reproduce the pipeline independently simply run:
bash run_pipeline.sh
For exact instructions on how to reproduce our analysis see run_pipeline.sh
.
Our in silico deconvolution of CD4+ cells, CD8+ cells, and Macrophages in TCGA data matches very closely to immunohistochemistry estimates of the same cell types in a separate dataset. The proportions of immune cell infiltrate across subtypes corresponds strongly.
We also observed that high macrophage infiltration was associated with worse outcomes in the TCGA dataset. This relationship was strengthed after adjusting for several covariates including age, gender, and gene expression based subtype.
For all code related questions, bug reporting, or feature requests please file a GitHub issue
All analyses were performed in R version 3.2.3 and packages were versioned with the checkpoint package (version 0.3.18) set to a snapshot date of "2016-08-16". The checkpoint package will automatically download all the specified packages at the versions they existed in on that specific date. See install.R for more details. The versions for each package are specified in sessionInfo.txt.
We also provide a Docker image to recreate the compute environment. See the Dockerfile for more details.