This repository contains the "tensorflow" implementation of our paper "graph2vec: Learning distributed representations of graphs". The paper could be found at:
This code is developed in python 2.7. It is ran and tested on Ubuntu 16.04. It uses the following python packages:
- tensorflow (version == 1.4.0)
- networkx (version <= 2.0)
- scikit-learn (+scipy, +numpy)
1. git clone the repository (command: git clone )
2. untar the data.tar.gz tarball
The procedure for obtaining rooted graph vectors using graph2vec and performing graph classification is as follows:
1. move to the folder "src" (command: cd src) (also make sure that kdd 2015 paper's (Deep Graph Kernels) datasets are available in '../data/kdd_datasets/dir_graphs/')
2. run --corpus <dataset of graph files> --class_labels_file_name <file containing class labels of graphs to be used for graph classification> file to:
*Generate the weisfeiler-lehman kernel's rooted subgraphs from all the graphs
*Train skipgram model to learn graph embeddings. The same will be dumped in ../embeddings/ folder
*Perform graph classification using the graph embeddings generated in the above step
3. example:
*python --corpus ../data/kdd_datasets/mutag --class_labels_file_name ../data/kdd_datasets/mutag.Labels
*python --corpus ../data/kdd_datasets/proteins --class_labels_file_name ../data/kdd_datasets/proteins.Labels --batch_size 16 --embedding_size 128 --num_negsample 5
optional arguments:
-h, --help show this help message and exit
-c CORPUS, --corpus CORPUS
Path to directory containing graph files to be used
for graph classification or clustering
File name containg the name of the sample and the
class labels
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to directory for storing output embeddings
-b BATCH_SIZE, --batch_size BATCH_SIZE
Number of samples per training batch
-e EPOCHS, --epochs EPOCHS
Number of iterations the whole dataset of graphs is
Intended graph embedding size to be learnt
-neg NUM_NEGSAMPLE, --num_negsample NUM_NEGSAMPLE
Number of negative samples to be used for training
-lr LEARNING_RATE, --learning_rate LEARNING_RATE
Learning rate to optimize the loss function
--wlk_h WLK_H Height of WL kernel (i.e., degree of rooted subgraph
features to be considered for representation learning)
-lf LABEL_FILED_NAME, --label_filed_name LABEL_FILED_NAME
Label field to be used for coloring nodes in graphs
using WL kenrel
In case of queries, please email: [email protected] OR [email protected]
Please consider citing the follow paper when you use this code.
title={graph2vec: Learning distributed representations of graphs},
author={Narayanan, Annamalai and Chandramohan, Mahinthan and Venkatesan, Rajasekar and Chen, Lihui and Liu, Yang}