Skip to content

Latest commit

 

History

History
365 lines (296 loc) · 24 KB

README.md

File metadata and controls

365 lines (296 loc) · 24 KB

Canine_Leukocyte_scRNA

DOI

NEWS: Interested in more canine scRNA data? Check out our naive osteosarcoma scRNA-seq atlas and the associated GitHub page.

This GitHub repository contains all the analysis code used in, "A single-cell RNA sequencing atlas of circulating leukocytes from healthy and osteosarcoma affected dogs."

If you use our raw/processed data, extract data using the UCSC Cell Browser portal, or use portions of our code in your analysis, please cite:

Ammons DT, Harris RA, Hopkins LS, Kurihara J, Weishaar K and Dow S (2023) A single-cell RNA sequencing atlas of circulating leukocytes from healthy and osteosarcoma affected dogs. Front. Immunol. 14:1162700. doi: 10.3389/fimmu.2023.1162700

Repository goals:

  • provide a resource to make the data generated from this project accessible
  • enable reproducible/transparent data reporting
  • provide analysis code to reproduce custom figures

If you have any questions or concerns, please submit an issue, contact the corresponding author(s), and/or contact Dylan Ammons at dylan.ammons @ colostate dot edu.

File structure:

  • 📁 input contains relevant metadata files and instructions for obtaining data associated with this study
  • 📁 analysis contains the analysis code and source file used to complete the data analysis

Supplemental data and potential uses:

  1. Browse the data
  2. Cell type annotations
  3. Reference Mapping
  4. GSEA using dataset
  5. Module scoring
  6. CIBERSORTx

1. Browse the complete annotated dataset

The proccessed dataset is avaliable for browsing via the UCSC Cell Browser portal. Using the portal you can explore feature expression throughout the dataset as well as obtain the transcriptomic signatures of each cell type though an interactive webpage.

Link to the dataset: https://canine-leukocyte-atlas.cells.ucsc.edu

Link to UCSC Cell Browser documentation: https://cellbrowser.readthedocs.io/en/master/

2. Cell type annotations with defining markers

Cell markers lists were curated using 7 healthy canine leukocyte samples. The top 50 defining features (identified using FindMarkers for each cell type were considered, with the top 24 features evaluated for specificity using violin plots and preference given to unique features only found in the top 50 of one cell type.

cellType.l1

Cell Type Marker
CD8/NK cell CCL5, GZMB, KLRB1, NCR3, GZMA, IL12RB2, TRPM3, KLRK1, KLRD1, CD96
CD4 T cell IL7R, LEF1, DOCK3, ATP10A, CD52, CSTA, ICOS, CD3E
Monocyte LYZ, MAFB, BPI, FN1, F13A1, TCF7L2, S100P, CD83, NRG1, PLXDC2, MITF
Dendritic Cell IGF1, NCAM2, FLT3, RAB3C, FCER1A, FGF12, NRXN1, TLR3
Neutrophil S100A12, CYP4F22, RGS2, CD4, ALDH1A2, SGK1, IL1R2
Granulocyte DACH1, TGM2, ADGRE2, CA8, SMPD1, IL5RA
B cell TNFRSF13C, PAX5, EBF1, BTLA, HTR1F, NRIP1, MS4A1, DLA-DRA
Plasma cell JCHAIN, TXNDC5, IRF4, TNFRSF17, DERL3, CCR10, MPP6, TNFRSF13B
DN T cell KANK1, TMEM132D, NMB, SYNJ2, GZMK, LEF1
gd T cell PDE11A, PSD3, RHEX, IL17RB, CDH4, GATA3, FAT1, ZNF683
Cycling T cell TOP2A, MKI67, RRM2, H1-5, DIAPH3, TK1, KIF11, TPX2
CD34+ Unclassified ZNF521, CD34, KIT, CD109, IGF2BP2

cellType.l2

Cell type Marker
CD8 T cell
CD8 T cell CCL5, GZMB, TRPM3, KLRK1, KLRB1, CCL4, IL2RB, KLRD1, CD96
CD8+ Memory GZMK, KLRK1, GZMB, KLRB1, KLRD1, KLRG1, CD8A, FASLG, CCR5
CD4 T cell
CD4+ Naive ATP10A, RGS10, LEF1, CTPS1, ZNF536, SATB1, CSTA, ITGA1, COL6A5
CD4+ T reg IKZF2, S100A5, CTLA4, GATA3, IL2RA, TOX, CD80, ZNF831
CD4+ TCM LEF1, CCR7, TSHZ2, LTB, CD52, CSTA, IL7R, TCF7
CD4+ TEM IL7R, RORA, PLCL1, MAF, SLC9A9, ICOS, GALNT17, PTPN13, DOCK3
NK cell
NK T cell TGFBR3, GPA33, RARRES1, IL2RB, IKZF2, KLRK1, CD96
NK cell GZMA, PI3, IL2RB, KLRF1, PTPRM, PAX4, CD96, IGSF3, STMN2, F2RL3
Monocyte/DC
DC PKIB, TCF4, FLT3, HAVCR1, SDC2, NCAM2, MRC1, IGF1, FCER1A
M-MDSC IL18, IL1B, MEFV, CPXM2, LTF, CYP4F22, STEAP4, KCNJ2, S100A12
Monocyte ARHGAP45, F13A1, CD83, FN1, LYZ, NRG1, RETN, SLAMF9, TCF7L2
Granulocyte
Eosinophil CSTB, SLCO4C1, TGM2, ADGRE2, C30H15orf48, PTPRN2, SMPD1, CA8
PMN-MDSC PGLYRP1, CAMP, MMP9, CRISP2, TCN1, MMP8, CD177, FADS1, LTF, S100A12
Neutrophil CYP4F22, S100A12, CD4, SOD2, SERPINA1, IL1R2, ALDH1A2, S100A8
Basophil SLCO4C1, IL5RA, DAPK2, DACH1, CA8, ADGRE2, EEPD1, ANKRD33B, HK2
B cell
B cell TNFRSF13C, PAX5, BANK1, EBF1, BTLA, PLEKHO1, HTR1F, NRIP1
Plasma cell MZB1, JCHAIN, LMAN1, TXNDC5, LAP3, IRF4, RARRES2, TNFRSF17
Miscellaneous
DN T cell KIAA0825, TMEM132D, NMB, SYNJ2, GZMK, MYB, KANK1, PLCL1, SLF1, CTLA4
gd T cell PDE11A, PSD3, CRLF2, IL17RB, VSTM4, RHEX, FAT1, TOX2
Other T cell RRM2, MKI67, SPC24, TK1, CENPF, TOP2A, CLSPN, NCAPG
CD34+ Unclassified CD34, NDST3, TFPI, CLEC3B, KIT, NAV3, CD109

cellType.l3 (Default)

Cell Type Marker
CD8 T cell
CD8+ Naive ITGA1, LEF1, PTGDR, IL2RB, ADGRG1, NBEA
CD8+ Effector CCL5, TRPM3, IL12RB2, GZMB, KLRB1, GZMA, NCR3, IL2RB, KLRD1, CD96
CD8+ Memory GZMK, GZMB, PI3, BTBD11, CTSW, CCR5, CCL4, KLRG1, FASLG
CD8+ gd T cell PTHLH, IGF2BP2, ABTB2, AKAP12, SOX4, CTSW, SLC16A10, PXT1, ZNRF3, SULT2B1
CD4 T cell
CD4+ Naive LEF1, CSTA, RGS10, ZNF536, CCR7, COL6A5, LTB, TNFSF8
CD4+ TCM LEF1, TSHZ2, CD52, CCR7, IL7R, CTPS1, EFHC2, CARMIL1
CD4+ TEM IL7R, SLC9A9, ICOS, MAF, CD28, SKAP1, CD40LG
CD4+ TEM, Th1-like IL7R, PTPN13, IL18R1, CD28, RCAN2, CCR9, CCR5, IL12RB2, CD52, PRUNE2
CD4+ TEM, Th2-like RNF220, ITGA2, GATA3, CCDC3, LGALS3, PTPN13, S100A2, PPEF1, CMA1
CD4+ TEM, Th17-like NTRK2, PTPN13, ADAM12, NRG2, RGS17, DNAH8, CCR6, NPAS2, RORA, LTBP1
CD4+ T reg IKZF2, CTLA4, RGS1, ICOS, IL2RA, CD28, ZNF831
CD4+, IFN signature CXCL10, IFI44, OAS1, ISG15, IFI44L, IFGGB2, CTLA4, STAT1, DDX58, XAF1
Monocyte
Monocyte, CD4- LYZ, BPI, LRMDA, MT2A, F13A1, FN1, NRG1, CCDC88A, CD83, RETN
Monocyte, CD4+ IL1B, MAFB, NFKBIA, CXCL8, FN1, BLOC1S6, CD83, S100P, BPI, NRG1
M-MDSC IL18, IL1B, LTF, MEFV, KCNJ2, CPXM2, S100A12, STEAP4, CSF3R, IL31RA
Monocyte, IFN signature RSAD2, OAS1, OAS2, DDX58, HERC6, OAS3, RTP4, EIF2AK2, IFIT2
Dendritic cell
Pre-DC FGF12, GPHA2, MTUS2, FCER1A, PLCE1, PTPRS, IGF1, NECTIN1, IL3RA, AK8
Myeloid cDC1 ZNF366, SDC2, DISC1, ECRG4, TMEM163, RIMS2, KIT, OTOF, RTKN2, RAB7B
Myeloid cDC2 PKIB, CD300H, SDC2, CD1C, NCAM2, CD86, BATF3, ZNF366, PID1, ECM1
Plasmacytoid DC COBLL1, RAB3C, IGF1, FCER1A, RYR1, PRKG1, CCND1, STYXL2, ANK1, OCIAD2
Unclassified DC PLCB4, ZNF366, KCNK13, STRIP2, SDC2, OTOF, HACD1, C5, SLC8A1, CNTLN
Granulocyte
Neutrophil S100A12, CD4, SERPINA1, SGK1, S100A8, ALDH1A2, FNDC3B, GGH, SRGN, IL1R2
PMN-MDSC CAMP, PGLYRP1, CRISP2, MMP9, MMP8, TCN1, CD177, LTF, FADS1, S100A12
Eosinophil C30H15orf48, TGM2, DACH1, PADI3, SMPD1, CA8, IL5RA
Basophil DACH1, CA8, IL5RA, DAPK2, TGFA, ANKRD33B, HK2, PRR5L
B cell
Immature B cell SYT1, PAX5, VPREB3, ERC2, TMTC2, KLHL14, F8, TEX9, TDRP, ADGRF1
Naive B cell TNFRSF13C, BANK1, HTR1F, PAX5, EBF1, BTLA, NRIP1, ADAM9
Class switched B cell TNFRSF13C, GOLM1, BANK1, BTLA, EBF1, DYNC1I1, MTMR2, PAX5
Activated B cell IGKC, CACNB2, PAX5, TNFRSF13C, IGHM, RASGRF2, AOX2, BCAR3, ADAM32
Plasma cell JCHAIN, MZB1, TXNDC5, LMAN1, FKBP11, LAP3, DERL3, CCR10, MKI67, TNFRSF13B
Miscellaneous
DN T cell KIAA0825, TMEM132D, KANK1, NMB, CTLA4, SYNJ2, BICDL1, SLF1, ID3, KIAA1549
gd T cell PARD3B, RHEX, IL17RB, CDH4, GATA3, FAT1, TOX2, ADARB1, ZNF683, TGFBR3
NK cell KLRF1, STMN2, PAX4, NCR3, F2RL3, CD96, IL2RB, IGSF3, FREM1, FASLG
Cycling T cell TOP2A, MKI67, RRM2, H1-5, DIAPH3, TK1, KIF11, TPX2, ASPM
NK T cell GPA33, TGFBR3, KLRK1, CD96, SYTL2, MOV10L1, SLA2, DSTN, RARRES1
CD34+ Unclassified TFPI, ZNF521, CD34, NDST3, GUCY1A1, HPGD, CLEC3B, KIT, CD109, DNTT

3. Using the data to complete reference mapping

Reference mapping is useful tool to facilitate the identification of cell types in single cell datasets. The approach described here uses Seurat functions to identify anchors between a query dataset (external/personal data) and the reference datasets generated in this study. The default approach describes how to use the healthy only dataset, but it will also work with the combined dataset if you load that file in as the reference.

Before running the reference mapping code, a Seurat object need to be preprocessed using SCT normalization (+/- integration) and stored as an object named seu.obj. If you wish to use the reference on log normalized data, please see the hidden code below.

#set the path to the location in which the reference file is saved
reference <- readRDS(file = "../../k9_PBMC_scRNA/analysis/output/s3/final_dataSet_HvO.rds")

#prepare the reference
reference[['integrated']] <- as(object = reference[['integrated']] , Class = "SCTAssay")
DefaultAssay(reference) <- "integrated"

#find conserved anchors with query and reference
anchors <- FindTransferAnchors(
    reference = reference,
    query = seu.obj,
    normalization.method = "SCT",
    reference.reduction = "pca",
    dims= 1:50
)

#select meta.data slot to use for label transfer -- change refdata value to use alternate labels (i.e., refdata = reference$celltype.l1)
predictions <- TransferData(anchorset = anchors, refdata = reference$celltype.l3,
    dims = 1:50)
seu.obj <- AddMetaData(seu.obj, metadata = predictions)

#generate and save the image
pi <- DimPlot(seu.obj, 
              reduction = "umap", 
              group.by = "predicted.id",
              pt.size = 0.25,
              label = T,
              label.box = T,
              shuffle = F
)
ggsave("./output/referenceMap.png", width = 7, height = 7)
If using log normalized data try this (click me!)

#set the path to the location in which the reference file is saved
reference <- readRDS(file = "../../k9_PBMC_scRNA/analysis/output/s3/final_dataSet_HvO.rds")

#find conserved anchors with query and reference
anchors <- FindTransferAnchors(
    reference = reference,
    query = seu.obj,
    reference.reduction = "pca",
    dims= 1:30
)

#select meta.data slot to use for label transfer -- change refdata value to use alternate labels (i.e., refdata = reference$celltype.l1)
predictions <- TransferData(
    anchorset = anchors,
    refdata = reference$celltype.l3,
    dims = 1:30
)
seu.obj <- AddMetaData(seu.obj, metadata = predictions)

#generate and save the image
pi <- DimPlot(seu.obj, reduction = "umap", group.by = "predicted.id", 
              pt.size = 0.25, label = T, label.box = T, shuffle = F)
ggsave("./output/referenceMap.png", width = 7, height = 7)

4. Gene set enrichment analysis

The data generated from this work have the potential to provide supporting evidence to evaluate/confirm the cell identity of sorted bulk RNA sequencing dataset. One approach to do this is to use gene set enrichment analysis (GSEA) with the terms representing the cell type identified in our dataset.

Required input: a list of gene symbols that you wish to query. In this case the genelists ate stored in a dataframe called clus.markers

These gene lists could be generated by simply using the features with the highest level of expression after normalizing your dataset, comparing the transcriptome of a cell population of interest (i.e., blood-derived macrophage) verses a reference (i.e., total PBMCs), or any other relevant approach to identify genes of interest.

Example data frame format:

> str(clus.markers)
'data.frame':   400 obs. of  2 variables:
 $ gene   : chr  "B2M" "CD74" "DLA-64" "PPBP" ...
 $ cluster: chr  "sample_1" "sample_2" "sample_3" "sample_4" ...
#read in the one of the supplemntal data files provided with the publication
geneLists <- read.csv(file = "./input/supplementalData_1.csv")

#clean the reference
datas <- geneLists[,c("cluster","gene")]
colnames(datas) <- c("gs_name", "gene_symbol")
datas <- datas %>% group_by(gs_name) %>% top_n(50) %>% dplyr::distinct(gene_symbol) %>% as.data.frame()

#run GSEA using clusterProfiler
clusters <- unique(clus.markers$cluster)
df.list <- list()
for (cluster in clusters) {
    clus_sub <- clus.markers[clus.markers$cluster == cluster, ]

    #run clusterProfiler::enricher
    enriched <- as.data.frame(clusterProfiler::enricher(gene = clus_sub$gene, TERM2GENE = datas, pvalueCutoff = 1))
    if(nrow(enriched) > 0){
        enriched$cluster <- cluster
        enriched <- head(enriched) #only takes the top 6 terms - can modify if desired
        df.list[[which(cluster == clusters)]] <- enriched
    }
}

cellCalls <- do.call(rbind, df.list)
outfile <- paste("./output/cell_classification.csv", sep = "")
write.csv(cellCalls, file = outfile)

#plot the data
plot <- ggplot(data = cellCalls, mapping = aes_string(x = 'cluster', y = 'ID')) +
    geom_point(mapping = aes_string(size = 'Count', color = -log2(cellCalls$p.adjust))) +
    theme(axis.text.x = element_blank(),
          axis.ticks = element_blank(),
          axis.line = element_blank(),
          panel.background = element_rect(fill = "white",colour = NA),
          plot.background = element_rect(fill = "white",colour = NA),
          legend.background = element_rect(fill = "transparent",colour = NA),
          legend.key = element_rect(fill = "transparent",colour = NA),
          panel.grid.major = element_line(color = "gray"), 
          panel.grid.minor = element_line(color = "gray")
          ) + 
    scale_colour_viridis(option="magma", name='-log2(padj)') +
    guides(size=guide_legend(title="Gene count", override.aes=list(fill=NA))) +
    geom_text(aes(y = as.numeric(length(unique(ID))), label = cluster), size = 3.5,vjust = -.3, angle=45, hjust = -0.1) +
    coord_cartesian(expand = TRUE, clip = "off") +
    xlab("Sample") + ylab("GSEA term")

#check path is correct
ggsave("./output/gsea_scRNA_terms.png", width = 6, height = 4)

5. Module scoring

Module scoring is a supplemental approach that can be applied to single cell datasets with the goal of providing further insights into cell identities. The approach described below uses the Seurat function AddModuleScore and the gene lists presented in Table 3 (also found in supplemental data 4) of our associated manuscript.

The concept of the AddModuleScore() function is similar to GSEA, but also distinct in many ways. Read the Seurat documentation and/or check out this webpage for more details.

#load in the reference file from supplemental data
ref.df <- read.csv("supplementalData_4.csv", header = T)

#organize the data
modulez <- split(ref.df$gene, ref.df$cellType_l2)

#complete module scoring
seu.obj <- AddModuleScore(seu.obj,
                          features = modulez,
                         name = "_score")

#correct the naming -- credit to: https://github.com/satijalab/seurat/issues/2559
names(seu.obj@meta.data)[grep("_score", names(seu.obj@meta.data))] <- names(modulez)

#plot the results -- uses a custom function, so you will need to source the customFeunctions.R file. Alt: can also be visulized with FeaturePlot() or DotPlot()
features <- names(modulez)
ecScores <- majorDot(seu.obj = seu.obj, groupBy = "clusterID_sub", scale = T,
                     features = features
                    ) + theme(axis.title = element_blank(),
                              #axis.ticks = element_blank(),
                              #legend.justification = "left",
                              #plot.margin = margin(7, 21, 7, 7, "pt")
                              legend.direction = "vertical",
                              legend.position = "right"
                             ) + guides(color = guide_colorbar(title = 'Scaled\nenrichment\nscore')) + guides(size = guide_legend(nrow = 3, byrow = F, title = 'Percent\nenriched'))

ggsave(paste("./output/", outName, "/", outName, "_dots_celltypes.png", sep = ""),width = 10,height=6)

6. CIBERSORTx

Under development

The reference dataset generated from this study provides the data required to make a comprehensive CIBERSORTx reference to be used for deconvolution of bulk RNA sequencing of peripheral blood. We do not feel comfortable providing a reference that can be directly used, as this study did not benchmark CIBERSORTx references.

We have plans to complete futher work up and release references at a later date. They will be linked here when available.

In the meantime, feel free to generate your own cibersort references using this dataset. Please cite this publication if using the dataset to generate a CIBERSORTx reference. Also, I would recommend completing validation studies to evaluate the performance of the reference before applying it to your system.

With the being said, here is the code I would use to prepare our dataset for CIBERSORTx reference generation:

##### create input ref for cibersort #####
#load in processed data from GEO -- see input dir for instructions (healthy only will do, but could used combined dataset if desired
#NOTE: it may work better if only using 1 dog to generate the reference, but the code here does not do that)
seu.obj.h <- readRDS(file = "./output/s3/final_dataSet_H.rds")

#randomly downsample the subset data to obtain equal number of cells in each celltype
set.seed(12)
seu.obj.h <- subset(x = seu.obj.h, downsample = min(table(seu.obj.h$celltype.l1))) #can pull any of the 3 celltype metadata slots
seu.obj.h <- NormalizeData(seu.obj.h)

#extract necessary data
rownameCode <- as.data.frame(seu.obj.h$celltype.l1)
cntMatrix <- seu.obj.h@assays$RNA@data
colnames(rownameCode)[1] <- "name"
colnames(cntMatrix) <- rownameCode$name[match(colnames(cntMatrix), rownames(rownameCode))]
cntMatrix <- as.data.frame(cntMatrix)
cntMatrix$gene <- rownames(cntMatrix)
cntMatrix <- cntMatrix %>% relocate(gene)

#save the matrix in the format required for cibersort -- follow instructions at https://cibersortx.stanford.edu/
write.table(cntMatrix, file="./output/ciberSort/cibersort_forRef_ds.txt", row.names = F, col.names = T, quote=FALSE, sep='\t')