Synmatch finds direct matching of cells between different measurements by exploiting information about neighborhood structure in each modality. It takes as input two matrices of single-cell profiles measuring different cellular properties, such as gene expression and chromatin accessibility, and outputs a matching of the cells across the datasets.
The key idea behind Synmatch is that the same cell, when measured in two different modalities, is likely to have similar sets of neighboring cells in the two spaces. We use this intuition to formulate the matching problem as a supermodular optimization over the neighborhood structure of the two modalities, and we solve the problem using a fast greedy heuristic. Note that the two modalities need not share any features, Synmatch operates in an entirely unsupervised manner.
You can find more about the theory behind Synmatch by reading our paper. If you use Synmatch in an academic setting please cite us.
Synmatch is implemented in Python and uses Docker as well as the common numpy, sklearn, and scipy packages. Note that Synmatch relies on Coopraiz, an ultra-fast software for submodular optimizations developed by Jeff Bilmes at smr.ai, which is included as a Docker container. Therefore, please make sure you have Docker installed and running prior to running Synmatch.
Simply provide the two different cell measurements as .npy matrices where each row corresponds to a cell and each column to a feature:
%> python bin/synmatch.py data/example_RNA.npy data/example_ATAC.npy outputfile.txt
Feel free to email Borislav Hristov: borislav at uw.edu