Skip to content

Latest commit

 

History

History
207 lines (143 loc) · 11.4 KB

README.md

File metadata and controls

207 lines (143 loc) · 11.4 KB

Dir-GNN: Graph Neural Networks for Directed Graphs

PWCPWCPWCPWC

Dir-GNN is a machine learning model that enables learning on directed graphs. This repository contains the official implementation of the paper "Edge Directionality Improves Learning on Heterophilic Graphs", where we introduce Dir-GNN and show that leveraging edge directionality leads to improved learning on heterophilic graphs.

Overview

Graph Neural Networks (GNNs) have become the de-facto standard tool for modeling relational data. However, while many real-world graphs are directed, the majority of today's GNN models discard this information altogether by simply making the graph undirected. The reasons for this are historical: 1) many early variants of spectral GNNs explicitly required undirected graphs, and 2) the first benchmarks on homophilic graphs did not find significant gain from using direction.

In our paper, we show that in heterophilic settings, treating the graph as directed increases the effective homophily of the graph, suggesting a potential gain from the correct use of directionality information. To this end, we introduce Directed Graph Neural Network (Dir-GNN), a novel general framework for deep learning on directed graphs. Dir-GNN can be used to extend any Message Passing Neural Network (MPNN) to account for edge directionality information by performing separate aggregations of the incoming and outgoing edges.

We prove that Dir-GNN matches the expressivity of the Directed Weisfeiler-Lehman test, exceeding that of conventional MPNNs. In extensive experiments, we validate that while our framework leaves performance unchanged on homophilic datasets, it leads to large gains over base models such as GCN, GAT and GraphSage on heterophilic benchmarks, outperforming much more complex methods and achieving new state-of-the-art results.

Getting Started

To get up and running with the project, you need to first set up your environment and install the necessary dependencies. This guide will walk you through the process step by step.

Setting Up the Environment

The project is designed to run on Python 3.10. We recommend using Conda to set up the environment as follows:

conda create -n directed_gnn python=3.10
conda activate directed_gnn

Installing Dependencies

Once the environment is activated, install the required packages:

conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pyg pytorch-sparse -c pyg
pip install ogb==1.3.6
pip install pytorch_lightning==2.0.2
pip install gdown==4.7.1

Please ensure that the version of pytorch-cuda matches your CUDA version. If your system does not have a GPU, use the following command to install PyTorch:

conda install pytorch==2.0.1 -c pytorch

For M1/M2/M3 Mac users, pyg (PyTorch Geometric) needs to be installed from source. Detailed instructions for this process can be found here.

Code Structure

  • run.py: This script is used to run the models.

  • model.py: Contains the definition of the directed convolutions (DirGCN, DirSage, and DirGAT) explored in the paper.

  • homophily.py: Houses functions for computing the weighted node homophily and the weighted compatibility matrix, as defined in our paper.

Running Experiments

This section provides instructions on how to reproduce the experiments outlined in the paper. Note that some of the results may not be reproduced exactly, given that some of the operations used are intrinsically non-deterministic on the GPU, as explained here. However, you should obtain results very close to those in the paper.

Dir-GNN Experiments

To reproduce the best Dir-GNN results on heterophilic datasets (Table 3 in our paper), use the following command:

python -m src.run --dataset chameleon --use_best_hyperparams --num_runs 10

The --dataset parameter specifies the dataset to be used. Replace chameleon with the name of the dataset you want to use.

Ablation Study on Using Directionality

Table 2 and 4 contain the results of an ablation study where we compare common undirected GNN models (GCN, GraphSage, GAT) with their Dir-GNN counterpart.

To reproduce the results for the undirected models, run the following command:

python -m src.run --dataset chameleon --conv_type gcn --num_runs 10 --patience 200 --normalize --undirected 

Here, --conv_type specifies the convolution type and can be set to gcn, sage or gat.

To instead reproduce the Dir-GNN for different values of $alpha$, use the following commands:

python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 1
python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 0
python -m src.run --dataset chameleon --conv_type dir-gcn --num_runs 10 --patience 200 --normalize --alpha 0.5

In this case, --conv_type can be set to dir-gcn, dir-sage and dir-gat.

Synthetic Experiments

You can run synthetic experiment (Figure 2b) using the following command:

./run_synthetic_experiment sage

You can also replace sage with gcn, or gat to specify the base model to be used in the synthetic experiment and obtain the results in Figure 8.

Dataset Fix

For Citeseer-Full and Cora-ML datasets, PyG loads them as undirected by default. To utilize these datasets in their directed form, a slight modification is required in the PyG local installation. Please comment out the line edge_index = to_undirected(edge_index, num_nodes=x.size(0)) in the file located at:

/miniconda

3/envs/your_env/lib/python3.10/site-packages/torch_geometric/io/npz.py

Command Line Arguments

The following command line arguments can be used with the code:

Dataset Arguments

Argument Type Default Value Description
--dataset str "chameleon" Name of the dataset
--dataset_directory str "dataset" Directory to save datasets
--checkpoint_directory str "checkpoint" Directory to save checkpoints

Preprocessing Arguments

Argument Action Description
--undirected store_true Use undirected version of graph
--self_loops store_true Add self-loops to the graph
--transpose store_true Use transpose of the graph

Model Arguments

Argument Type Default Value Description
--model str "gnn" Model type
--hidden_dim int 64 Hidden dimension of model
--num_layers int 3 Number of GNN layers
--dropout float 0.0 Feature dropout
--alpha float 0.5 Direction convex combination params
--learn_alpha action - If set, learn alpha
--conv_type str "dir-gcn" DirGNN Model
--normalize action - If set, normalize
--jk str "max" Either "max", "cat" or None

Training Args

Argument Type Default Value Description
--lr float 0.001 Learning Rate
--weight_decay float 0.0 Weight decay
--num_epochs int 10000 Max number of epochs
--patience int 10 Patience for early stopping
--num_runs int 1 Max number of runs

System Args

Argument Type Default Value Description
--use_best_hyperparams flag If specified, use the best hyperparameters
--gpu_idx int 0 Indexes of GPU to run the program on
--num_workers int 0 Number of workers for the dataloader
--log str "INFO" Log Level. Choices: ["DEBUG", "INFO", "WARNING"]
--profiler flag If specified, enable profiler

Citation

@misc{dirgnn_rossi_2023,
    title={Edge Directionality Improves Learning on Heterophilic Graphs},
    author={Emanuele Rossi and Bertrand Charpentier and Francesco Di Giovanni and Fabrizio Frasca and Stephan Günnemann and Michael Bronstein},
    publisher={arXiv},
    year={2023}
}

Contact

If you have any questions, issues or feedback, feel free to reach out to Emanuele Rossi at [email protected] or Bertrand Charpentier at [email protected].