Code and resources related to the analysis of regeneration in the olfactory epithelium
Below are the R scripts for analyzing the single-cell RNA-seq data from HBC stem cells of the olfactory epithelium, presented in the following manuscript:
Gadye L*, Das D*, Sanchez MA*, Street KN, Baudhuin A, Wagner A, Cole MB, Choi YG, Yosef N, Purdom E, Dudoit S, Risso D, Ngai J, Fletcher RB. Identification of activated stem cell states unique to regeneration in the olfactory epithelium. (* equal contribution)
The data are available on GEO in GSE99251 and GSE95601.
The repository currently has scripts that take as input Expression Set data and perform a series of computations, interspersed with visualizations. First, the data are filtered for poor quality cells and less informative genes. The data are normalized, and biological contaminants and known doublets (based on co-expression of differentiated cell markers) are removed. Then, the data are re-filtered and re-normalized.
After filtering and normalization, we clustered the data using clusterExperiment, performed developmental ordering and inferred lineage trajectories and branching with slingshot. For each lineage, differentially expressed genes were identified. We used Gene Set Enrichment Analysis to infer pathways regulating cell fates and transitions.
We created a number of visualizations based on clustering, experimental condition, and developmental order. We displayed coordinated and correlated differentially expressed genes including transcription factors, as well as a set of cell cycle genes and selected regulators of cell fate transitions along each lineage. The olfactory receptors and factors associated with OR regulation were plotted along the neuronal lineage. We also presented the top enriched gene sets for each cell cluster.
In project directory, run mkdir -p output/{clust,data,romer,viz,DE,EDA}/oeHBCregen
, and add new directories to .gitignore
. Place the scripts in the 'scripts' directory and the initial eSet 'data' in the data directory.
oeHBC_1_filt_norm.sh
performs the following analyses, by calling various R scripts (given in parentheses):
- Filtering based on technical attributes (
oeHBC_filtering.R
) - Normalization using SCONE to get rankings (
oeHBC_norm.R
)
oeHBC_2_norm.sh
performs the following analyses, by calling various R scripts (given in parentheses):
- Get normalized matrices for several normalizations (
oeHBC_norm.R
) - Make SummarizedExperiment objects for each normalization (
oeHBC_makeSE.R
) - Create final list of samples to exclude as biological contaminants (
oeHBC_exclude.R
)
oeHBC_3_filt_norm.sh
performs the following analyses, by calling various R scripts:
- Filtering based on contaminants and technical attributes (
oeHBC_filtering.R
) - Re-normalization after removal of contaminants (
oeHBCregen_norm.R
) - Create SummarizedExperiment object for each experiment (
oeHBCregen_makeSE_expt.R
)
oeHBC_4_clust.sh
performs the following:
- Cluster samples in each experiment (
oeHBC_clust.R
)
oeHBC_4b_devO_DE.sh
performs the following analyses, by calling various R scripts (given in parentheses):
- Developmental ordering with slingshot (
oeHBCregenWT_slingshot.Rmd
&oeHBCregenWTKO_slingshot.Rmd
) - Differential gene expression using limma, along each lineage (
oeHBCregenWT_DE.Rmd
)
oeHBC_5_GSEA.sh
:
- Preparation of gene sets for Gene Set Enrichment Analysis (GSEA;
oeHBCregenWT_GSEAprep.Rmd
) - GSEA based on cell clustering using limma romer (
oeHBCregenWT_romerGSEA.R
)
oeHBC_6_viz.sh
performs the following analyses, by calling various R scripts (given in parentheses):
- Visualizations based on cell clustering (heatmap of marker genes, tSNE plots, PCA pairs plot, cluster & experimental condition bubble plots;
oeHBCregen_clusterPlots.Rmd
&oeHBCdiffregen_clusterPlots.Rmd
) - Visualizations incorporating developmental ordering (3D-PCA plots, dot plots;
oeHBCregen_devorderplots.Rmd
) - Transcription factor co-expression, network analysis, and visualizations (
oeHBCregen_tf.Rmd
) - Plots of individual or pairs of genes in developmental order (
oeHBCregen_genePlots.Rmd
&oeHBCregen_genePairsPlots.Rmd
) - Volcano plots of differentially expressed genes (
oeHBCregen_volcano.R
) - Olfactory Receptor (OR) gene and OR regulation associated gene expression plots (
oeHBCregen_OR.R
)
- Look for transcription factor motifs in the top 1000 most enriched genes in the activated HBC1 cluster relative to resting HBCs (
oeHBC_findMotif.sh
)
- SCONE (normalization): http://bioconductor.org/packages/release/bioc/html/scone.html
- clusterExperiment (clustering): http://bioconductor.org/packages/release/bioc/html/clusterExperiment.html
- slingshot (lineage trajectory algorithm): https://github.com/kstreet13/slingshot