AGL: A Scalable System for Industrial-purpose Graph Machine Learning

Key ideas

GNNs are inherently difficult to train in parallel because of the edge dependencies.
Alternatives are storing the whole graph in memory (very expensive) or use third party storage (very slow)
Instead: generate K-hop neighborhood, information complete subgraph for each node
Merge values of in-edge neighbors, propagate to out-edge via MapReduce
2-layer GNN training with 2B nodes 100B edges in 14h.

Introduction

Heterogeneous graph in Ant Financial based on message passing, merging and propagation
Computing each node’s K-hop embeddings based on message passing:
- merging neighbors from in-edges and propagating merged information to neighbors along out-edges.
Each K-hop neighborhood is a subgraph

System

GraphFlat is an efficient and distributed generator, based on message passing for generating K-hop neighborhoods.
- These neighborhoods are flattened into protobuf strings.
GraphTrainer leverages many techniques, such as pipeline, pruning, and edge-partition, to eliminate the overhead on I/O and optimize the floating point calculations
GraphInfer, a distributed inference module that splits K layer GNN models into K slices, and applies the message passing K times based on MapReduce.
GraphInfer maximally utilizes the embedding of each node because all the intermediate embedding at the k-th layer will be propagated to next round of message passing. This significantly boosts the inference tasks
We can divide an industrial-scale graph into massive of tiny K-hop neighborhoods w.r.t. their target nodes in advance, and load one or a batch of them rather than the entire graph into memory in the training phase