Phosphosite motif explorer -- graph network abstraction through embeddings
Pomegranate interactively explores phosphosite motifs on protein structures through a web application.
Clustering of protein graphs is also possible through a command line interface.
Alternative databases are also supported.
Local PDB files can also be used (IN PROGRESS)
Currently using Saccharomyces cerevisiae
and Homo sapiens
Other bulk-download options can be found here.
TODO: upload to PyPI
pip install graphein
pip install dash
# DSSP
conda install -c salilab dssp
# or
sudo apt-get install dssp
pip install graphein
pip install dash
# DSSP
conda install -c salilab dssp
# or
sudo apt-get install dssp
$ pomegranate load --help
Usage: pomegranate load [OPTIONS] PHOSPHOSITE STRUCTURES GRAPHS
Options:
-v, --verbose Show extensive program output.
-d, --debug Show extensive program output for debugging.
-q, --quiet Suppress program output.
-n, --dry-run, --dryrun Print out what the program would do without
loading the graphs.
-u, --unique Only construct graphs for unique motifs.
Duplicate entries (i.e. with different
kinases) are ignored.
--download / -S, --skip-download
Skip downloading protein structures into the
supplied directory. If the required
structure does not exist, graph construction
for that accession ID will be skipped.
-o, --graph-format, --output-format, --format [NetworkX|StellarGraph|nx|sg]
Save graphs as NetworkX or StellarGraph
instances with feature preprocessing.
[default: NetworkX]
-N, --num-psites INTEGER Only consider the first N motifs in a
dataset. Graph construction will continue
until N graphs are made, or the end of the
dataset is reached.
-r, --radius FLOAT The threshold radius of the motif [default:
10.0]
--rsa, --rsa-threshold FLOAT The RSA threshold of the motif [default:
0.0]
--node-features, --nf <node_features>
Which node features to include in the
constructed graphs. [default:
m1,m2,m3,m4,m5,m6,m7pdist,coords,bfac,rsa]
--edge-features, --ef TEXT
-c, --config TEXT Path to config.yml file used to specify how
to construct the graphs. [default:
config.yml]
--help Show this message and exit.
$ pomegranate load --help
Usage: load [OPTIONS] PHOSPHOSITE STRUCTURES GRAPHS
Options:
-v, --verbose Show extensive program output.
-d, --debug Show extensive program output for debugging.
-q, --quiet Suppress program output.
-n, --dry-run, --dryrun Print out what the program would do without
loading the graphs.
-u, --unique Only construct graphs for unique motifs.
Duplicate entries (i.e. with different
kinases) are ignored.
--download / -S, --skip-download
Skip downloading protein structures into the
supplied directory. If the required
structure does not exist, graph construction
for that accession ID will be skipped.
-o, --graph-format, --output-format, --format [NetworkX|StellarGraph|nx|sg]
Save graphs as NetworkX or StellarGraph
instances with feature preprocessing.
[default: NetworkX]
-N, --num-psites INTEGER Only consider the first N motifs in a
dataset. Graph construction will continue
until N graphs are made, or the end of the
dataset is reached.
-r, --radius FLOAT The threshold radius of the motif [default:
10.0]
--rsa, --rsa-threshold FLOAT The RSA threshold of the motif [default:
0.0]
--node-features, --nf <node_features>
Which node features to include in the
constructed graphs. [default:
m1,m2,m3,m4,m5,m6,m7pdist,coords,bfac,rsa]
--edge-features, --ef TEXT
-c, --config TEXT Path to config.yml file used to specify how
to construct the graphs. [default:
config.yml]
--help Show this message and exit.
Usage: cluster_graphs.py [OPTIONS] GRAPHS SAVEPATH
Options:
--train-method [graph|node] Method to use for training the graph neural
network. [default: graph]
-e, --epochs INTEGER Number of epochs to train for. [default: 50]
-b, --batch-size INTEGER Batch size to be used by data generator.
[default: 16]
-N, --num-graphs INTEGER
-v, --verbose
--write-csv, --csv
--help Show this message and exit.
TODO: have multiple models to use for clustering with different training methods associated with each.
E.g.
- GCN trained on graph-graph distance;
- node classification layer (semi-supervised) etc.
Usage: visualise_embeddings.py [OPTIONS] EMBEDDINGS OUTDIR
Options:
-N, --num-plots INTEGER How many plots to output. [default: 1]
--dim-method, --method [UMAP|tSNE|PCA]
Method to use for dimensionality reduction.
[default: tSNE]
-l, --labels / --no-labels Show labels on plot.
-i, --interactive Open interactive plot viewer.
-v, --verbose
--help Show this message and exit.
Example clustering:
Hierarchical, k-means, etc.
(From left --> right & up --> down) Tab selection, Protein searchbars, Adjacency matrix, Radius & RSA sliders, Asteroid plot, Phosphorylation site selection, Matrix order dropdown,
(From left --> right & up --> down) Tab selection, Protein searchbars, Radius slider, Matrix Order dropdown, Grayscale dropdown, Phosphosite selection bar, Adjacency matrices