Generalized PCA for non-normally distributed data. If you find this useful please cite Feature Selection and Dimension Reduction based on a Multinomial Model. (doi:10.1186/s13059-019-1861-6)
A python implementation is also available.
The glmpca package is available from CRAN. To install the stable release (recommended):
install.packages("glmpca")
To install the development version:
remotes::install_github("willtownes/glmpca")
library(glmpca)
#create a simple dataset with two clusters
mu<-rep(c(.5,3),each=10)
mu<-matrix(exp(rnorm(100*20)),nrow=100)
mu[,1:10]<-mu[,1:10]*exp(rnorm(100))
clust<-rep(c("red","black"),each=10)
Y<-matrix(rpois(prod(dim(mu)),mu),nrow=nrow(mu))
#visualize the latent structure
res<-glmpca(Y, 2)
factors<-res$factors
plot(factors[,1],factors[,2],col=clust,pch=19)
For more details see the vignettes. For compatibility with Bioconductor, see scry. For compatibility with Seurat objects, see Seurat-wrappers.
GLM-PCA has been around for awhile and we have not been able to dedicate as much time to its maintenance and ongoing improvement as we would like. Fortunately, there are numerous alternative implementations that improve on our basic idea. Many of them are likely to be faster and more memory-efficient than our version, and some have interesting additional capabilities such as uncertainty quantification. In reverse chronological order, here are some packages to check out.
- fastglmpca. Preprint: Weine, Carbonetto, & Stephens (2024).
- scGBM. Preprint: Nicol & Miller 2023.
- NewWave. Publication: Agostinis et al 2022.
- LDVAE. Publication: Svensson et al 2020.
Please use https://github.com/willtownes/glmpca/issues to submit issues, bug reports, and comments.