Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial evaluation notebook to metacells module #868

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 20 additions & 5 deletions .github/workflows/run_metacells.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,22 +41,37 @@ jobs:
- name: Checkout repo
uses: actions/checkout@v4

# Downloading just one sample for initial testing
- name: Download test data
run: ./download-data.py --test-data --format AnnData --sample SCPCS000001
- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: 4.4.0
use-public-rspm: true

- name: Set up pandoc
uses: r-lib/actions/setup-pandoc@v2

- name: Set up renv
uses: r-lib/actions/setup-renv@v2
with:
working-directory: ${{ env.MODULE_PATH }}

- name: Set up conda
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-version: latest

- name: Install and activate locked conda environment
- name: Install locked conda environment
run: |
conda install conda-lock
conda-lock install --name test ${MODULE_PATH}/conda-lock.yml
conda-lock install --name openscpca-metacells ${MODULE_PATH}/conda-lock.yml

# Downloading just one sample for initial testing
- name: Download test data
run: ./download-data.py --test-data --format AnnData --sample SCPCS000001

- name: Run analysis module
run: |
cd ${MODULE_PATH}
conda activate openscpca-metacells
# run module script here
bash run-metacells.sh
4 changes: 4 additions & 0 deletions analyses/metacells/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Don't activate renv in an OpenScPCA docker image
if(Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE'){
source('renv/activate.R')
}
16 changes: 16 additions & 0 deletions analyses/metacells/metacells.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
124 changes: 124 additions & 0 deletions analyses/metacells/notebooks/testing-seacells.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: "Testing SEACells results"
output: html_notebook
params:
project_id: "SCPCP000001"
sample_id: "SCPCS000001"
library_id: "SCPCL000001"
---

## Introduction

After running SEACells on the data, we want to do some evaluation of how well it might have worked on various samples.
This notebook is to pilot what some of the more and less useful visualizations might be, as well as to test running in a mixed R/Python environment.
Will this result in the best of both worlds?

## Setup

```{r setup}
library(ggplot2)
library(reticulate)
# reticulate::use_condaenv("openscpca-metacells")
```

```{python imports}
import pathlib
import pickle

import matplotlib
import matplotlib.pyplot as plt
import scanpy as sc
import seaborn as sns
import SEACells
```

### Path definitions

```{r paths}
root_dir <- rprojroot::find_root(rprojroot::is_git_root)
data_dir <- file.path(root_dir, "data", "current")
base_dir <- file.path(root_dir, "analyses", "metacells")
results_dir <- file.path(base_dir, "results", params$project_id, params$sample_id)

# these files were created by the `scripts/run-seacells.py` script
sc_anndata_file <- file.path(results_dir, paste0(params$library_id, "_seacells.h5ad"))
sc_model_file <- file.path(results_dir, paste0(params$library_id, "_seacells_model.pkl"))
```

### Load SEACells results

```{python load_data}
# the updated AnnData file
sc_ad = sc.read(r.sc_anndata_file)
# the SEACells model object
with open(r.sc_model_file, "rb") as f:
sc_model = pickle.load(f)
```


## Visualizations

### Kernel matrix

The first visualization we will look at is the kernel matrix.
This should show a strong block-diagonal structure.

```{python}
_ = sns.clustermap(sc_model.kernel_matrix.toarray()[:500, :500])
plt.show()
```

### Model fitting

We can look at where SEACells started with the metacells as plotted on the UMAP grid, and how the model converged.

First, where it started:

```{python}
SEACells.plot.plot_initialization(sc_ad, sc_model)
```

Now, how the convergence went:

```{python}
sc_model.plot_convergence()
```

Finally, we can see where the metacells ended up on the UMAP grid:

```{python}
SEACells.plot.plot_2D(sc_ad, key="X_umap", colour_metacells=True)
```

We can also plot how many cells were assigned to each metacell:

```{python}
SEACells.plot.plot_SEACell_sizes(sc_ad, bins=10)
```

### Cell type purity

The first real measure of how well SEACells worked may be to look at the cell type purity of the metacells.
We will use the cell types as assigned by singleR to do this.

If SEACells is working well, we should expect lots of high values here, indicating that most of the cells that contribute to a metacell were from the same initial cell type.

```{python}
purity = SEACells.evaluate.compute_celltype_purity(
sc_ad, "singler_celltype_ontology"
)
```

```{r purity_plot}
ggplot(py$purity, aes(y = singler_celltype_ontology_purity)) +
geom_boxplot() +
labs(
title = "Celltype purity",
y = "Celltype purity (singler)"
)
```

That looks pretty good, but if we wanted to evaluate whether it was better than chance, we would need to do some randomization, assigning cells to metacells randomly and seeing whether this method is outperfroming that.
The main concern would be that if there are not many different cell types, we will expect to get very high purity scores.

But that evaluation will be in a later revision of this notebook.
2,072 changes: 2,072 additions & 0 deletions analyses/metacells/notebooks/testing-seacells.nb.html

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions analyses/metacells/renv/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
library/
local/
cellar/
lock/
python/
sandbox/
staging/
Loading
Loading