Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing metrics from different embeddings #384

Open
samfenske opened this issue Jun 9, 2023 · 0 comments
Open

Comparing metrics from different embeddings #384

samfenske opened this issue Jun 9, 2023 · 0 comments

Comments

@samfenske
Copy link

Hello,

I have a question regarding the interpretation of scib metrics using different embeddings. I have an integrated object in which I would expect the X_scVI embedding to have the best scib metrics, and the normalized count matrix for 1000 HVGs to output inferior scores. I also computed X_umap (derived from X_scVI), and X_pca (derived from the normalized count matrix) scib scores.

I'm getting rather inconsistent results here, which makes me skeptical in how I compare scib scores from a given object. For reference, here my goal is to use scib metrics to identify the number of highly variable genes that best removes batch effect while preserving biological signal. I'm summarizing results as average bio and average batch scores. Bio scores are the average NMI, ARI, and ASW (by cell type label), and batch the average of ASW (by batch) and graph connectivity.

Somehow the count matrix (I computed this by adding the normalized count matrix to adata.obsm['X_raw']) has a better batch score (0.86) than the scVI embedding (0.85). UMAP and PCA have scores of 0.79 and 0.82, respectively. Additionally, UMAP has the best bio score (0.577), followed by scVI (0.566), PCA (0.55), and the normalized count matrix (0.546).

My question is if these types of results can be expected? I want to try a couple HVG sets and compare metrics after HVG selection to pick the right number, but based on these comparisons I'm not sure how to interpret the scores I'm getting. Perhaps my intuition is off that scVI should have the best scores and the normalized matrix the worst.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant