- GNNs are inherently difficult to train in parallel because of the edge dependencies.
- Alternatives are storing the whole graph in memory (very expensive) or use third party storage (very slow)
- Instead: generate K-hop neighborhood, information complete subgraph for each node
- Merge values of in-edge neighbors, propagate to out-edge via MapReduce
- 2-layer GNN training with 2B nodes 100B edges in 14h.
- Heterogeneous graph in Ant Financial based on message passing, merging and propagation
- Computing each node’s K-hop embeddings based on message passing:
- merging neighbors from in-edges and propagating merged information to neighbors along out-edges.
- Each K-hop neighborhood is a subgraph
- GraphFlat is an efficient and distributed generator, based on message passing for generating K-hop neighborhoods.
- These neighborhoods are flattened into protobuf strings.
- GraphTrainer leverages many techniques, such as pipeline, pruning, and edge-partition, to eliminate the overhead on I/O and optimize the floating point calculations
- GraphInfer, a distributed inference module that splits K layer GNN models into K slices, and applies the message passing K times based on MapReduce.
- GraphInfer maximally utilizes the embedding of each node because all the intermediate embedding at the k-th layer will be propagated to next round of message passing. This significantly boosts the inference tasks
- We can divide an industrial-scale graph into massive of tiny K-hop neighborhoods w.r.t. their target nodes in advance, and load one or a batch of them rather than the entire graph into memory in the training phase