Pipeline for pixel clustering and cell clustering of multiplexed imaging data as described in Liu et al. Robust phenotyping of highly multiplexed tissue imaging data using pixel-level clustering.
For a maintained version of the codebase, please see https://github.com/angelolab/ark-analysis. The codebase in this repository reflects a static version of the pipeline as described in Liu et al., but is not actively maintained.
This repo contains a pipeline for pixel clustering and cell clustering of multiplexed imaging data. The assumption is that you've already performed any necessary image processing on your data (such as denoising, background subtraction, autofluorescence correction, etc), and that it is ready to be analyzed.
The first step in the Pixie pipeline is to run the pixel clustering notebook. The notebook walks you through the process of generating pixel clusters for your data, and lets you specify what markers to use for the clustering, train a model, use it to classify your entire dataset, and generate pixel cluster overlays. The notebook includes a GUI for manual cluster adjustment and annotation. Workshop Talk - Session IV - Pixel Level Analysis
The second step in the Pixie pipeline is to run the cell clustering notebook. This notebook will use the pixel clusters generated in the first notebook to cluster the cells in your dataset. The notebook walks you through generating cell clusters for your data and generates cell cluster overlays. The notebook includes a GUI for manual cluster adjustment and annotation. Workshop Talk - Session V - Cell-level Analysis - Part 2: Cell Clustering
Open terminal and navigate to where you want the code stored. Clone the repo:
git clone https://github.com/angelolab/pixie.git
As this is an ongoing project, extra features may be added to the Pixie pipeline in the future. For an actively maintained version of the pipeline, please see https://github.com/angelolab/ark-analysis. The codebase in this pixie
repo corresponds to v0.6.4
of ark
.
For ease-of-use, we have created a Docker container to run this pipeline. Docker is a containerization platform that allows programs to be packaged into containers, which are standardized executable components that combine source code with OS libraries and dependences needed to run that code. We have created a setup video.
You'll need to download Docker Desktop:
- First, download Docker Desktop.
- Once it's successfully installed, make sure it is running by looking in toolbar for the Docker whale icon.
Enter the following command into terminal from the same directory you ran the above commands:
./start_docker.sh
If running for the first time, or if our Docker image has updated, it may take a while to build and setup before completion.
This will generate a link to a Jupyter notebook. Copy the last URL (the one with 127.0.0.1:8888
at the beginning) into your web browser. Be sure to keep this terminal open. Do not exit the terminal or enter control-c
until you are finished with the notebooks.
NOTE: If you already have a Jupyter session open when you run ./start_docker.sh
, you will receive a couple additional prompts. Copy the URL listed after Enter this URL instead to access the notebooks:
. You will need to authenticate. Note the last URL (the one with 127.0.0.1:8888
at the beginning), copy the token that appears there (it will be after token=
in the URL), paste it into the password prompt of the Jupyter notebook, and log in.
You can shut down the notebooks and close docker by entering control-c
in the terminal window.
Remember to duplicate and rename notebooks. If you didn't change the name of the notebooks within the templates
folder, they will be overwritten when you decide to update the repo.
If you are using Windows, please consult our Windows guide for additional instructions.
While we recommend users to use docker, if you choose to use this pipeline outside of docker, you can install ark
using pip:
pip install ark-analysis==0.6.4
If you would like to test out the pipeline, we have incorporated an example MIBI-TOF dataset within the notebooks. The dataset contains 11 FOVs with 22 channels (CD3, CD4, CD8, CD14, CD20, CD31, CD45, CD68, CD163, CK17, Collagen1, ECAD, Fibronectin, GLUT1, H3K9ac, H3K27me3, HLADR, IDO, Ki67, PD1, SMA, Vim), and intermediate data necessary for each notebook in the pipeline.
The dataset is split into several smaller components, with each Jupyter Notebook using a combination of those components. We utilize Hugging Face for storing the dataset and using their API's for creating these configurations. You can view the dataset's repository as well.
Image Data: This compartment stores the tiff files for each channel, for every FOV.
image_data/
├── fov0/
│ ├── CD3.tiff
│ ├── ...
│ └── Vim.tiff
├── fov1/
│ ├── CD3.tiff
│ ├── ...
│ └── Vim.tiff
├── .../
Cell Table: This compartment stores example cell tables.
segmentation/cell_table/
├── cell_table_arcsinh_transformed.csv
├── cell_table_size_normalized.csv
└── cell_table_size_normalized_cell_labels.csv
Deepcell Output: This compartment stores example segmentation images after running segmentation using Deepcell.
segmentation/deepcell_output/
├── fov0_whole_cell.tiff
├── fov0_nuclear.tiff
├── ...
├── fov10_whole_cell.tiff
└── fov10_nuclear.tiff
Example Pixel Output: This compartment stores feather files, csvs, and pixel masks generated by pixel clustering.
segmentation/example_pixel_output_dir/
├── cell_clustering_params.json
├── channel_norm.feather
├── channel_norm_post_rowsum.feather
├── pixel_thresh.feather
├── pixel_channel_avg_meta_cluster.csv
├── pixel_channel_avg_som_cluster.csv
├── pixel_masks/
│ ├── fov0_pixel_mask.tiff
│ └── fov1_pixel_mask.tiff
├── pixel_mat_data/
│ ├── fov0.feather
│ ├── ...
│ └── fov10.feather
├── pixel_mat_subset/
│ ├── fov0.feather
│ ├── ...
│ └── fov10.feather
├── pixel_meta_cluster_mapping.csv
└── pixel_som_weights.feather
Example Cell Output: This compartment stores feather files, csvs, and cell masks generated by cell clustering.
segmentation/example_cell_output_dir/
├── cell_masks/
│ ├── fov0_cell_mask.tiff
│ └── fov1_cell_mask.tiff
├── cell_meta_cluster_channel_avg.csv
├── cell_meta_cluster_count_avg.csv
├── cell_meta_cluster_mapping.csv
├── cell_som_cluster_channel_avg.csv
├── cell_som_cluster_count_avg.csv
├── cell_som_weights.feather
├── cluster_counts.feather
├── cluster_counts_size_norm.feather
└── weighted_cell_channel.csv
If you have a general question or are having trouble with part of the repo, please see https://github.com/angelolab/ark-analysis. You can refer to our FAQ or head to the discussions tab to get help. If you've found a bug with the codebase, first make sure there's not already an open issue, and if not, you can then open an issue describing the bug.