AlexsLemonade · jashapiro · Sep 26, 2024 · Sep 27, 2024 · Nov 7, 2024 · Nov 7, 2024
@@ -41,22 +41,37 @@ jobs:
       - name: Checkout repo
         uses: actions/checkout@v4
 
-      # Downloading just one sample for initial testing
-      - name: Download test data
-        run: ./download-data.py --test-data --format AnnData --sample SCPCS000001
+      - name: Set up R
+        uses: r-lib/actions/setup-r@v2
+        with:
+          r-version: 4.4.0
+          use-public-rspm: true
+
+      - name: Set up pandoc
+        uses: r-lib/actions/setup-pandoc@v2
+
+      - name: Set up renv
+        uses: r-lib/actions/setup-renv@v2
+        with:
+          working-directory: ${{ env.MODULE_PATH }}
 
       - name: Set up conda
         uses: conda-incubator/setup-miniconda@v3
         with:
           miniforge-version: latest
 
-      - name: Install and activate locked conda environment
+      - name: Install locked conda environment
         run: |
           conda install conda-lock
-          conda-lock install --name test ${MODULE_PATH}/conda-lock.yml
+          conda-lock install --name openscpca-metacells ${MODULE_PATH}/conda-lock.yml
+
+      # Downloading just one sample for initial testing
+      - name: Download test data
+        run: ./download-data.py --test-data --format AnnData --sample SCPCS000001
 
       - name: Run analysis module
         run: |
           cd ${MODULE_PATH}
+          conda activate openscpca-metacells
           # run module script here
           bash run-metacells.sh
@@ -0,0 +1,4 @@
+# Don't activate renv in an OpenScPCA docker image
+if(Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE'){
+  source('renv/activate.R')
+}
@@ -0,0 +1,16 @@
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: Sweave
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
@@ -0,0 +1,124 @@
+---
+title: "Testing SEACells results"
+output: html_notebook
+params:
+  project_id: "SCPCP000001"
+  sample_id: "SCPCS000001"
+  library_id: "SCPCL000001"
+---
+
+## Introduction
+
+After running SEACells on the data, we want to do some evaluation of how well it might have worked on various samples.
+This notebook is to pilot what some of the more and less useful visualizations might be, as well as to test running in a mixed R/Python environment.
+Will this result in the best of both worlds?
+
+## Setup
+
+```{r setup}
+library(ggplot2)
+library(reticulate)
+# reticulate::use_condaenv("openscpca-metacells")
+```
+
+```{python imports}
+import pathlib
+import pickle
+
+import matplotlib
+import matplotlib.pyplot as plt
+import scanpy as sc
+import seaborn as sns
+import SEACells
+```
+
+### Path definitions
+
+```{r paths}
+root_dir <- rprojroot::find_root(rprojroot::is_git_root)
+data_dir <- file.path(root_dir, "data", "current")
+base_dir <- file.path(root_dir, "analyses", "metacells")
+results_dir <- file.path(base_dir, "results", params$project_id, params$sample_id)
+
+# these files were created by the `scripts/run-seacells.py` script
+sc_anndata_file <- file.path(results_dir, paste0(params$library_id, "_seacells.h5ad"))
+sc_model_file <- file.path(results_dir, paste0(params$library_id, "_seacells_model.pkl"))
+```
+
+### Load SEACells results
+
+```{python load_data}
+# the updated AnnData file
+sc_ad = sc.read(r.sc_anndata_file)
+# the SEACells model object
+with open(r.sc_model_file, "rb") as f:
+  sc_model = pickle.load(f)
+```
+
+
+## Visualizations
+
+### Kernel matrix
+
+The first visualization we will look at is the kernel matrix.
+This should show a strong block-diagonal structure.
+
+```{python}
+_ = sns.clustermap(sc_model.kernel_matrix.toarray()[:500, :500])
+plt.show()
+```
+
+### Model fitting
+
+We can look at where SEACells started with the metacells as plotted on the UMAP grid, and how the model converged.
+
+First, where it started:
+
+```{python}
+SEACells.plot.plot_initialization(sc_ad, sc_model)
+```
+
+Now, how the convergence went:
+
+```{python}
+sc_model.plot_convergence()
+```
+
+Finally, we can see where the metacells ended up on the UMAP grid:
+
+```{python}
+SEACells.plot.plot_2D(sc_ad, key="X_umap", colour_metacells=True)
+```
+
+We can also plot how many cells were assigned to each metacell:
+
+```{python}
+SEACells.plot.plot_SEACell_sizes(sc_ad, bins=10)
+```
+
+### Cell type purity
+
+The first real measure of how well SEACells worked may be to look at the cell type purity of the metacells.
+We will use the cell types as assigned by singleR to do this.
+
+If SEACells is working well, we should expect lots of high values here, indicating that most of the cells that contribute to a metacell were from the same initial cell type.
+
+```{python}
+purity = SEACells.evaluate.compute_celltype_purity(
+    sc_ad, "singler_celltype_ontology"
+)
+```
+
+```{r purity_plot}
+ggplot(py$purity, aes(y = singler_celltype_ontology_purity)) +
+  geom_boxplot() +
+  labs(
+    title = "Celltype purity",
+    y = "Celltype purity (singler)"
+  )
+```
+
+That looks pretty good, but if we wanted to evaluate whether it was better than chance, we would need to do some randomization, assigning cells to metacells randomly and seeing whether this method is outperfroming that.
+The main concern would be that if there are not many different cell types, we will expect to get very high purity scores.
+
+But that evaluation will be in a later revision of this notebook.
@@ -0,0 +1,7 @@
+library/
+local/
+cellar/
+lock/
+python/
+sandbox/
+staging/