Change Log

1.2.1

Full Changelog

This release is a small bug fix release on top of 1.2.0.

Bug fixes and other changes

Update the URLs of some datasets (Cora, PubMedDiabetes, CiteSeer) for upstream changes #1738, #1759
Add two missing layers to the stellargraph.custom_keras_layers dictionary #1757
Experimental changes: rename RotHEScoring to RotHEScore #1756
DevOps:
- Automated testing on macOS #1752
- Automated testing against Neo4j 4.1, in addition to Neo4j 3.5 and 4.0 #1754
- Other CI: #1732, #1740, #1741, #1744, #1745, #1747, #1750, #1751, #1753

1.2.0

Full Changelog

Jump in to this release, with the new and improved demos and examples:

Comparison of link prediction with random walks based node embedding
Unsupervised training of a Cluster-GCN model with Deep Graph Infomax

Major features and improvements

Better Windows support: StellarGraph's existing ability to run on Windows has been improved, with all tests running on CI (#1696) and several small fixes (#1671, #1704, #1705).
Edge weights are supported in GraphSAGE (#1667) and Watch Your Step (#1604). This is in addition to the existing support for edge weights in GCN, GAT, APPNP, PPNP, RGCN, GCN graph classification, DeepGraphCNN and Node2Vec sampling.
Better and more demonstration notebooks and documentation to make the library more accessible to new and existing users:
- A demo notebook for a comparison of link prediction with random walks based node embedding, showing Node2Vec, Attri2Vec, GraphSAGE and GCN #1658
- The demo notebook for unsupervised training with Deep Graph Infomax has been expanded with more explanation and links #1257
- The documentation for models, generators and other elements now has many more links to other relevant items in a "See also" box, making it easier to fit pieces together (examples: GraphSAGE, GraphSAGENodeGenerator, BiasedRandomWalk) #1718
The Cluster-GCN training procedure supports unsupervised training via Deep Graph Infomax; this allows for scalable training of GCN, APPNP and GAT models, and includes connecting to Neo4j for large graphs demo (#1257)
KGTripleGenerator now supports the self-adversarial negative sampling training procedure for knowledge graph algorithms (from RotatE), via generator.flow(..., sample_strategy="self-adversarial") docs

Deprecations

The ClusterGCN model has been replaced with the GCN class. In the previous 1.1.0 release, GCN, APPNP and GAT were generalised to support the Cluster-GCN training procedure via ClusterNodeGenerator (which includes Neo4j support). The ClusterGCN model is now redundant and thus is deprecated: however, it still works without behaviour change.

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

RotE, RotH: knowledge graph link prediction algorithms that combine TransE and RotatE in Euclidean or hyperbolic space, respectively #1539

Bug fixes and other changes

There are now tests for saving and loading a Keras Model constructed every model in StellarGraph #1676. This includes fixes for some models (#1677, #1682). Known issues: sparse models such as GCN and RGCN (see #1251 for more info and a work-around using tf-nightly), experimental GCN-LSTM (#1681).
Various documentation, demo and error message fixes and improvements: better internal linking #1404, automated spell checking #1583, #1663, #1665, #1684, improved rendering #1722 including a better sidebar #1512, #1729, #1730
DevOps changes:
- CI has been moved from Buildkite to GitHub Actions (tracking issue: #1687; pull requests: #1688, #1692, #1690, #1691, #1693, #1694, #1701, #1707, #1712, #1714, #1715, #1717, #1719)
- CI: #1655, #1672, #1673, #1679, #1710, #1721, #1724

1.1.0

Full Changelog

Jump in to this release, with the new and improved demos and examples:

Neo4j graph database support: Cluster-GCN, GraphSAGE, all demos
Semi-supervised node classification via GCN, Deep Graph Infomax and fine-tuning
Loading data into StellarGraph from NumPy
Link prediction with Metapath2Vec
Unsupervised graph classification/representation learning via distances
RGCN section of Node representation learning with Deep Graph Infomax
Node2Vec with StellarGraph components: representation learning, node classification
Expanded Attri2Vec explanation: representation learning, node classification, link prediction

Major features and improvements

Support for the Neo4j graph database has been significantly improved:
- There is now a Neo4jStellarGraph class that packages up a connection to a Neo4j instance, and allows it to be used for machine learning algorithms including the existing Neo4j and GraphSAGE functionality demo, #1595, #1598.
- The ClusterNodeGenerator class now supports Neo4jStellarGraph in addition to the in-memory StellarGraph class, allowing it to be used to train models like GCN and GAT with data stored entirely in Neo4j demo (#1561, #1594, #1613)
Better and more demonstration notebooks and documentation to make the library more accessible to new and existing users:
- There is now a glossary that explains some terms specific to graphs, machine learning and graph machine learning #1570
- A new demo notebook for semi-supervised node classification using Deep Graph Infomax and GCN #1587
- A new demo notebook for link prediction using the Metapath2Vec algorithm #1614
New algorithms:
- Unsupervised graph representation learning demo (#1626)
- Unsupervised RGCN with Deep Graph Infomax demo (#1258)
- Native Node2Vec using TensorFlow Keras, not the gensim library, demo of representation learning, demo of node classification (#536, #1566)
- The ClusterNodeGenerator class can be used to train GCN, GAT, APPNP and PPNP models in addition to the ClusterGCN model #1585
The StellarGraph class continues to get smaller, faster and more flexible:
- Node features can now be specified as NumPy arrays or the newly added thin IndexedArray wrapper, which does no copies and has minimal runtime overhead demo (#1535, #1556, #1599). They can also now be multidimensional for each node #1561.
- Edges can now have features, taken as any extra/unused columns in the input DataFrames demo #1574
- Adjacency lists used for random walks and GraphSAGE/HinSAGE are constructed with NumPy and stored as contiguous arrays instead of dictionaries, cutting the time and memory or construction by an order of magnitude #1296
- The peak memory usage of construction and adjacency list building is now monitored to ensure that there are not large spikes for large graphs, that exceed available memory #1546. This peak usage has thus been optimised: #1551,
- Other optimisations: the edge_arrays, neighbor_arrays, in_node_arrays and out_node_arrays methods have been added, reducing time and memory overhead by leaving data as its underlying NumPy array #1253; the node_type method now supports multiple nodes as input, making algorithms like HinSAGE and Metapath2Vec much faster #1452; the default edge weight of 1 no longer consumes significant memory #1610.
Overall performance and memory usage improvements since 1.0.0, in numbers:
- A reddit graph has 233 thousand nodes and 11.6 million edges:
  - construction without node features is now 2.3× faster, uses 31% less memory and has a memory peak 57% smaller.
  - construction with node features from NumPy arrays is 6.8× faster, uses 6.5% less memory overall and 85% less new memory (the majority of the memory is shared with the original NumPy arrays), and has a memory peak (above the raw data set) 70% smaller, compared to Pandas DataFrames in 1.0.0.
  - adjacency lists are 4.7-5.0× faster to construct, use 28% less memory and have a memory peak 60% smaller.
- Various random walkers are faster: BiasedRandomWalk is up to 30× faster with weights and 5× faster without weights on MovieLens and up to 100× faster on some synthetic datasets, UniformRandomMetapathWalk is up to 17× faster (on MovieLens), UniformRandomWalk is up to 1.4× (on MovieLens).
TensorFlow 2.2 and thus Python 3.8 are now supported #1278

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

RotatE: a knowledge graph link prediction algorithm that uses complex rotations (|z| = 1) to encode relations #1522
GCN_LSTM (renamed from GraphConvolutionLSTM): time series prediction on spatio-temporal data. It is still experimental, but has been improved since last release:
- the SlidingFeaturesNodeGenerator class has been added to yield data appropriate for the model, straight from a StellarGraph instance containing time series data as node features #1564
- the hidden graph convolution layers can now have a custom output size #1555
- the model now supports multivariate input and output, including via the SlidingFeaturesNodeGenerator class (with multidimensional node features) #1580
- unit tests have been added #1560
Neo4j support: some classes have been renamed from Neo4J... (uppercase J) to Neo4j... (lowercase j).

Bug fixes and other changes

Edge weights are supported in methods using FullBatchNodeGenerator (GCN, GAT, APPNP, PPNP), RelationalFullBatchNodeGenerator (RGCN) and PaddedGraphGenerator (GCN graph classification, DeepGraphCNN), via the weighted=True parameter #1600
The StellarGraph class now supports conversion between node type and edge type names and equivalent ilocs #1366, which allows optimising some algorithms (#1367 optimises ranking with the DistMult algorithm from 42.6s to 20.7s on the FB15k dataset)
EdgeSplitter no longer prints progress updates #1619
The info method now merges edge types triples like A-[r]->B and B-[r]->A in undirected graphs #1650
There is now a notebook capturing time and memory resource usage on non-synthetic datasets, designed to help StellarGraph contributors understand and optimise the StellarGraph class #1547
Various documentation, demo and error message fixes and improvements: #1516 (thanks @thatlittleboy), #1519, #1520, #1537, #1541, #1542, #1577, #1605, #1606, #1608, #1624, #1628, #1632, #1634, #1636, #1643, #1645, #1649, #1652
DevOps changes:
- CI: #1518, tests are run regularly on a GPU #1249, #1647, #1653
- Other: #1558

1.0.0

Full Changelog

This 1.0 release of StellarGraph is the culmination of three years of active research and engineering to deliver an open-source, user-friendly library for machine learning (ML) on graphs and networks.

Jump in to this release, with the new demos and examples:

More helpful indexing and guidance for demos in our API documentation
Loading from Neo4j
More explanatory Node2Vec link prediction
Unsupervised GraphSAGE and HinSAGE via DeepGraphInfomax
Graph classification with GCNSupervisedGraphClassification and with DeepGraphCNN
Time series prediction using spatial information, using GraphConvolutionLSTM (experimental)

Major features and improvements

Better demonstration notebooks and documentation to make the library more accessible to new and existing users:
- Notebooks are now published in the API documentation, for better & faster rendering and more convenient access #1279 #1433 #1448
- The demos indices and READMEs now contain more guidance and explanation to make it easier to find a relevant example #1200
- Several demos have been added or rewritten: loading data from Neo4j #1184, link prediction using Node2Vec #1190, graph classification with GCN, graph classification with DGCNN
- Notebooks now detect if they're being used with an incorrect version of the StellarGraph library, eliminating confusion about version mismatches #1242
- Notebooks are easier to download, both individually via a button on each in the API documentation #1460 and in bulk #1377 #1459
- Notebooks have been re-arranged and renamed to be more consistent and easier to find #1471
New algorithms:
- GCNSupervisedGraphClassification: supervised graph classification model based on Graph Convolutional layers (GCN) #929, demo.
- DeepGraphCNN (DGCNN): supervised graph classification using a stack of graph convolutional layers followed by SortPooling, and standard convolutional and pooling (such as Conv1D and MaxPool1D) #1212 #1265, demo
- SortPooling layer: the node pooling layer introduced in Zhang et al #1210
DeepGraphInfomax can be used to train almost any model in an unsupervised way, via the corrupt_index_groups parameter to CorruptedGenerator #1243, demo. Additionally, many algorithms provide defaults and so can be used with DeepGraphInfomax without specifying this parameter:
- any model using FullBatchNodeGenerator, including models supported in StellarGraph 0.11: GCN, GAT, PPNP and APPNP
- GraphSAGE #1162
- HinSAGE for heterogeneous graphs with node features #1254
UnsupervisedSampler supports a walker parameter to use other random walking algorithms such as BiasedRandomWalk, in addition to the default UniformRandomWalk. #1187
The StellarGraph class is now smaller, faster and easier to construct and use:
- The StellarGraph(..., edge_type_column=...) parameter can be used to construct a heterogeneous graph from a single flat DataFrame, containing a column of the edge types #1284. This avoids the need to build separate DataFrames for each type, and is significantly faster when there are many types. Using edge_type_column gives a 2.6× speedup for loading the stellargraph.datasets.FB15k dataset (with almost 600 thousand edges across 1345 types).
- StellarGraph's internal cache of node adjacencies is now computed lazily #1291 and takes into account whether the graph is directed or not #1463, and they now use the smallest integer type they can #1289
- StellarGraph's internal list of source and target nodes are now stored using integer "ilocs" #1267, reducing memory use and making some functionality significantly faster #1444 #1446)
- Functions like graph.node_features() no longer needs node_type specified if graph has only one node type (this includes classes like HinSAGENodeGenerator, which no longer needs head_node_type if there is only one node type) #1375
Overall performance and memory usage improvements since 0.11, in numbers:
- The FB15k graph has 15 thousand nodes and 483 thousand edges: it is now 7× faster and 4× smaller to construct (without adjacency lists). It is still about 2× smaller when directed or undirected adjacency lists are computed.
- Directed adjacency matrix construction is up to 2× faster
- Various samplers and random walkers are faster: HinSAGENodeGenerator is 3× faster (on MovieLens), Attri2VecNodeGenerator is 4× faster (on CiteSeer), weighted BiasedRandomWalk is up to 3× faster, UniformRandomMetapathWalk is up to 7× faster

Breaking changes

The stellargraph/stellargraph docker image wasn't being published in an optimal way, so we have stopped updating it for now #1455
Edge weights are now validated to be numeric when creating a StellarGraph. Previously edge weights could be any type, but all algorithms that use them would fail with non-numeric types. #1191
Full batch layers no longer support an "output indices" tensor to filter the output rows to a selected set of nodes #1204 (this does not affect models like GCN, only the layers within them: APPNPPropagationLayer, ClusterGraphConvolution, GraphConvolution, GraphAttention, GraphAttentionSparse, PPNPPropagationLayer, RelationalGraphConvolution). Migration: post-process the output using tf.gather manually or the new sg.layer.misc.GatherIndices layer.
GraphConvolution has been generalised to work with batch size > 1, subsuming the functionality of the now-deprecated ClusterGraphConvolution (and GraphClassificationConvolution) #1205. Migration: replace stellargraph.layer.ClusterGraphConvolution with stellargraph.layer.GraphConvolution.
BiasedRandomWalk now takes multi-edges into consideration instead of collapsing them when traversing the graph. It previously required all multi-edges had to same weight and only counted one of them when considering where to walk, but now a multi-edge is equivalent to having an edge whose weight is the sum of the weights of all edges in the multi-edge #1444

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

GraphConvolutionLSTM: time series prediction on spatio-temporal data, combining GCN with a LSTM model to augment the conventional time-series model with information from nearby data points #1085, demo

Bug fixes and other changes

Random walk classes like UniformRandomWalk and BiasedRandomWalk can have their hyperparameters set on construction, in addition to in each call to run #1179
Node feature sampling was made ~4× faster by ensuring a better data layout, this makes some configurations of GraphSAGE (and HinSAGE) noticeably faster #1225
The PROTEINS dataset has been added to stellargraph.datasets, for graph classification #1282
The BlogCatalog3 dataset can now be successfully downloaded again #1283
Knowledge graph model evaluation via rank_edges_against_all_nodes now defaults to the random strategy for breaking ties, and supports top (previous default) and bottom as alternatives #1223
Creating a RelationalFullBatchNodeGenerator is now significantly faster and requires much less memory (18× speedup and 560× smaller for the stellargraph.datasets.AIFB dataset) #1274
Creating a FullBatchNodeGenerator or FullBatchLinkGenerator is now significantly faster and requires much less memory (3× speedup and 480× smaller for the stellargraph.datasets.PubMedDiabetes dataset) #1277
StellarGraph.info now shows a summary of the edge weights for each edge type #1240
The plot_history function accepts a return_figure parameter to return the matplotlib.figure.Figure value, for further manipulation #1309 (Thanks @LarsNeR)
Tests now pass against the TensorFlow 2.2.0 release candidates, in preparation for the full 2.2.0 release #1175
Some functions no longer fail for some particular cases of empty graphs: StellarGraph.to_adjacency_matrix #1378, StellarGraph.from_networkx #1401
CorruptedGenerator on a FullBatchNodeGenerator can be used to train DeepGraphInfomax on a subset of the nodes in a graph, instead of all of them #1415
The stellargraph.custom_keras_layers dictionary for use when loading a Keras model now includes all of StellarGraph's layers #1280
PaddedGraphGenerator.flow now also accepts a list of StellarGraph objects as input #1458
Supervised Graph Classification demo now prints progress update messages during training #1485
Explicit contributors file has been removed to avoid inconsistent acknowledgement #1484. Please refer to the GitHub display for contributors instead.
Various documentation, demo and error message fixes and improvements: #1141, #1219, #1246, #1260, #1266, #1361, #1362, #1385, #1386, #1363, #1376, #1405 (thanks @thatlittleboy), #1408, #1393, #1403, #1401, #1397, #1396, #1391, #1394, #1434 (thanks @thatlittleboy), #1442, #1438 (thanks @thatlittleboy), #1413, #1450, #1440, #1453, #1447, #1467, #1465 (thanks @thatlittlboy), #1470, #1475, #1480, #1468, #1472, #1474
DevOps changes:
- CI: #1161, #1189, #1230, #1122, #1421
- Other: #1197, #1322, #1407

1.0.0rc1

Full Changelog

This is the first release candidate for StellarGraph 1.0. The 1.0 release will be the culmination of 2 years of activate development, and this release candidate is the first milestone for that release.

Jump in to this release, with the new demos and examples:

More helpful indexing and guidance in demo READMEs
Loading from Neo4j
More explanatory Node2Vec link prediction
Unsupervised GraphSAGE and HinSAGE via DeepGraphInfomax
Graph classification with GCNSupervisedGraphClassification
Time series prediction using spatial information, using GraphConvolutionLSTM (experimental)

Major features and improvements

Better demonstration notebooks and documentation to make the library more accessible to new and existing users:
- The demos READMEs now contain more guidance and explanation to make it easier to find a relevant example #1200
- A demo for loading data from Neo4j has been added #1184
- The demo for link prediction using Node2Vec has been rewritten to be clearer #1190
- Notebooks are now included in the API documentation, for more convenient access #1279
- Notebooks now detect if they're being used with an incorrect version of the StellarGraph library, elimanting confusion about version mismatches #1242
New algorithms:
- GCNSupervisedGraphClassification: supervised graph classification model based on Graph Convolutional layers (GCN) #929, demo.
DeepGraphInfomax can be used to train almost any model in an unsupervised way, via the corrupt_index_groups parameter to CorruptedGenerator #1243, demo. Additionally, many algorithms provide defaults and so can be used with DeepGraphInfomax without specifying this parameter:
- any model using FullBatchNodeGenerator, including models supported in StellarGraph 0.11: GCN, GAT, PPNP and APPNP
- GraphSAGE #1162
- HinSAGE for heterogeneous graphs with node features #1254
UnsupervisedSampler supports a walker parameter to use other random walking algorithms such as BiasedRandomWalk, in addition to the default UniformRandomWalk. #1187
The StellarGraph class is now smaller, faster and easier to construct:
- The StellarGraph(..., edge_type_column=...) parameter can be used to construct a heterogeneous graph from a single flat DataFrame, containing a column of the edge types #1284. This avoids the need to build separate DataFrames for each type, and is significantly faster when there are many types. Using edge_type_column gives a 2.6× speedup for loading the stellargraph.datasets.FB15k dataset (with almost 600 thousand edges across 1345 types).
- StellarGraph's internal cache of node adjacencies now uses the smallest integer type it can #1289. This reduces memory use by 31% on the FB15k dataset, and 36% on a reddit dataset (with 11.6 million edges).

Breaking changes

Edge weights are now validated to be numeric when creating a StellarGraph, previously edge weights could be any type, but all algorithms that use them would fail. #1191
Full batch layers no longer support an "output indices" tensor to filter the output rows to a selected set of nodes #1204 (this does not affect models like GCN, only the layers within them: APPNPPropagationLayer, ClusterGraphConvolution, GraphConvolution, GraphAttention, GraphAttentionSparse, PPNPPropagationLayer, RelationalGraphConvolution). Migration: post-process the output using tf.gather manually or the new sg.layer.misc.GatherIndices layer.
GraphConvolution has been generalised to work with batch size > 1, subsuming the functionality of the now-deprecated ClusterGraphConvolution (and GraphClassificationConvolution) #1205. Migration: replace stellargraph.layer.ClusterGraphConvolution with stellargraph.layer.GraphConvolution.

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

SortPooling layer: the node pooling layer introduced in Zhang et al #1210
DeepGraphConvolutionalNeuralNetwork (DGCNN): supervised graph classification using a stack of graph convolutional layers followed by SortPooling, and standard convolutional and pooling (such as Conv1D and MaxPool1D) #1212 #1265
GraphConvolutionLSTM: time series prediction on spatio-temporal data, combining GCN with a LSTM model to augment the conventional time-series model with information from nearby data points #1085, demo

Bug fixes and other changes

Random walk classes like UniformRandomWalk and BiasedRandomWalk can have their hyperparameters set on construction, in addition to in each call to run #1179
Node feature sampling was made ~4× faster by ensuring a better data layout, this makes some configurations of GraphSAGE (and HinSAGE) noticeably faster #1225
The PROTEINS dataset has been added to stellargraph.datasets, for graph classification #1282
The BlogCatalog3 dataset can now be successfully downloaded again #1283
Knowledge graph model evaluation via rank_edges_against_all_nodes now defaults to the random strategy for breaking ties, and supports top (previous default) and bottom as alternatives #1223
Creating a RelationalFullBatchNodeGenerator is now significantly faster and requires much less memory (18× speedup and 560× smaller for the stellargraph.datasets.AIFB dataset) #1274
StellarGraph.info now shows a summary of the edge weights for each edge type #1240
Various documentation, demo and error message fixes and improvements: #1141, #1219, #1246, #1260, #1266
DevOps changes:
- CI: #1161, #1189, #1230, #1122
- Other: #1197

0.11.1

Full Changelog

This bugfix release contains the same code as 0.11.0, and just fixes the metadata in the Anaconda package so that it can be installed successfully.

Bug fixes and other changes

The Conda package for StellarGraph has been updated to require TensorFlow 2.1, as TensorFlow 2.0 is no longer supported. As a result, StellarGraph will currently install via Conda on Linux and Windows - Mac support is waiting on the TensorFlow 2.1 osx-64 release to Conda. #1165

0.11.0

Full Changelog

Major features and improvements

The onboarding/getting-started process has been optimised and improved:
- The README has been rewritten to highlight our numerous demos, and how to get help #1081
- Example Jupyter notebooks can now be run directly in Google Colab and Binder, providing an easy way to get started with StellarGraph - simply click the and badges within each notebook. #1119.
- The new demos/basics directory contains two notebooks demonstrating how to construct a StellarGraph object from Pandas, and from NetworkX #1074
- The GCN node classification demo now has more explanation, to serve as an introduction to graph machine learning using StellarGraph #1125
New algorithms:
- Watch Your Step: computes node embeddings by simulating the effect of random walks, rather than doing them. #750.
- Deep Graph Infomax: performs unsupervised node representation learning #978.
- Temporal Random Walks (Continuous-Time Dynamic Network Embeddings): random walks that respect the time that each edge occurred (stored as edge weights) #1120.
- ComplEx: computes multiplicative complex-number embeddings for entities and relationships (edge types) in knowledge graphs, which can be used for link prediction. #901 #1080
- DistMult: computes multiplicative real-number embeddings for entities and relationships (edge types) in knowledge graphs, which can be used for link prediction. #755 #865 #1136

Breaking changes

StellarGraph now requires TensorFlow 2.1 or greater, TensorFlow 2.0 is no longer supported #1008
The legacy constructor using NetworkX graphs has been deprecated #1027. Migration: replace StellarGraph(some_networkx_graph) with StellarGraph.from_networkx(some_networkx_graph), and similarly for StellarDiGraph.
The build method on model classes (such as GCN) has been renamed to in_out_tensors #1140. Migration: replace model.build() with model.in_out_tensors().
The node_model and link_model methods on model classes has been replaced by in_out_tensors #1140 (see that PR for the exact list of types). Migration: replace model.node_model() with model.in_out_tensors() or model.in_out_tensors(multiplicity=1), and model.node_model() with model.in_out_tensors() or model.in_out_tensors(multiplicity=2).
Re-exports of calibration and ensembling functionality from the top-level of the stellargraph module were deprecated, in favour of importing from the stellargraph.calibration or stellargraph.ensemble submodules directly #1107. Migration: replace uses of stellargraph.Ensemble with stellargraph.ensemble.Ensemble, and similarly for the other names (see #1107 for all replacements).
StellarGraph.to_networkx parameters now use attr to refer to NetworkX attributes, not name or label #973. Migration: for any named parameters in graph.to_networkx(...), change node_type_name=... to node_type_attr=... and similarly edge_type_name to edge_type_attr, edge_weight_label to edge_weight_attr, feature_name to feature_attr.
StellarGraph.nodes_of_type is deprecated in favour of the nodes method #1111. Migration: replace some_graph.nodes_of_type(some_type) with some_graph.nodes(node_type=some_type).
StellarGraph.info parameters show_attributes and sample were deprecated #1110
Some more layers and models had many parameters move from **kwargs to real arguments: Attri2Vec (#1128), ClusterGCN (#1129), GraphAttention & GAT (#1130), GraphSAGE & its aggregators (#1142), HinSAGE & its aggregators (#1143), RelationalGraphConvolution & RGCN (#1148). Invalid (e.g. incorrectly spelled) arguments would have been ignored previously, but now may fail with a TypeError; to fix, remove or correct the arguments.
The method="chebyshev" option to FullBatchNodeGenerator, FullBatchLinkGenerator and GCN_Aadj_feats_op has been removed for now, because it needed significant revision to be correctly implemented #1028
The fit_generator, evaluate_generator and predict_generator methods on Ensemble and BaggingEnsemble have been renamed to fit, evaluate and predict, to match the deprecation in TensorFlow 2.1 of the tensorflow.keras.Model methods of the same name #1065. Migration: remove the _generator suffix on these methods.
The default_model method on Attri2Vec, GraphSAGE and HinSAGE has been deprecated, in favour of in_out_tensors #1145. Migration: replace model.default_model() with model.in_out_tensors().

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

GCNSupervisedGraphClassification: supervised graph classification model based on Graph Convolutional layers (GCN) #929.

Bug fixes and other changes

StellarGraph.to_adjacency_matrix is at least 15× faster on undirected graphs #932
ClusterNodeGenerator is now noticeably faster, which makes training and predicting with a ClusterGCN model faster #1095. On a random graph with 1000 nodes and 5000 edges and 10 clusters, iterating over an epoch with q=1 (each clusters individually) is 2× faster, and is even faster for larger q. The model in the Cluster-GCN demo notebook using Cora trains 2× faster overall.
The node_features=... parameter to StellarGraph.from_networkx now only needs to mention the node types that have features, when passing a dictionary of Pandas DataFrames. Node types that aren't mentioned will automatically have no features (zero-length feature vectors). #1082
A subgraph method was added to StellarGraph for computing a node-induced subgraph #958
A connected_components method was added to StellarGraph for computing the nodes involved in each connected component in a StellarGraph #958
The info method on StellarGraph now shows only 20 node and edge types by default to be more useful for graphs with many types #993. This behaviour can be customized with the truncate=... parameter.
The info method on StellarGraph now shows information about the size and type of each node type's feature vectors #979
The EdgeSplitter class supports StellarGraph input (and will output StellarGraphs in this case), in addition to NetworkX graphs #1032
The Attri2Vec model class stores its weights statefully, so they are shared between all tensors computed by build #1101
The GCN model defaults for some parameters now match the GraphConvolution layer's defaults: specifically kernel_initializer (glorot_uniform) and bias_initializer (zeros) #1147
The datasets submodule is now accessible as stellargraph.datasets, after just import stellargraph #1113
All datasets in stellargraph.datasets now support a load method to create a StellarGraph object (and other information): AIFB (#982), CiteSeer (#989), Cora (#913), MovieLens (#947), PubMedDiabetes (#986). The demo notebooks using these datasets are now cleaner.
Some new datasets were added to stellargraph.datasets:
- MUTAG: a collection of graphs representing chemical compounds #960
- WN18, WN18RR: knowledge graphs based on the WordNet linguistics data #977
- FB15k, FB15k_237: knowledge graphs based on the FreeBase knowledge base #977
- IAEnronEmployees: a small set of employees of Enron, and the many emails between them #1058
Warnings now point to the call site of the function causing the warning, not the warnings.warn call inside StellarGraph; this means DeprecationWarnings will be visible in Jupyter notebooks and scripts run with Python 3.7 #1144
Some code that triggered warnings from other libraries was fixed or removed #995 #1008, #1051, #1064, #1066
Some demo notebooks have been updated or fixed: demos/use-cases/hateful-twitters.ipynb (#1019), rgcn-aifb-node-classification-example.ipynb (#983)
The documentation "quick start" guide duplicated a lot of the information in the README, and so has been replaced with the latter #1096
API documentation now lists items under their recommended import path, not their definition. For instance, stellargraph.StellarGraph instead of stellargraph.core.StellarGraph (#1127), stellargraph.layer.GCN instead of stellargraph.layer.gcn.GCN (#1150) and stellargraph.datasets.Cora instead of stellargraph.datasets.datasets.Cora (#1157)
Some API documentation is now formatted better #1061, #1068, #1070, #1071
DevOps changes:
- Neo4j functionality is now tested on CI, and so will continue working #1046 #1050
- CI: #967, #968, #1036, #1067, #1097
- Other: #956, #962, #974

0.10.0

Full Changelog

Major features and improvements

The StellarGraph and StellarDiGraph classes are now backed by NumPy and Pandas #752. The StellarGraph(...) and StellarDiGraph(...) constructors now consume Pandas DataFrames representing node features and the edge list. This significantly reduces the memory use and construction time for these StellarGraph objects.

The following table shows some measurements of the memory use of g = StellarGraph(...), and the time required for that constructor call, for several real-world datasets of different sizes, for both the old form backed by NetworkX code and the new form backed by NumPy and Pandas (both old and new store node features similarly, using 2D NumPy arrays, so the measurements in this table include only graph structure: the edges and nodes themselves):

dataset	nodes	edges	size old (MiB)	size new (MiB)	size change	time old (s)	time new (s)	time change
Cora	2708	5429	4.1	1.3	-69%	0.069	0.034	-50%
FB15k	14951	592213	148	28	-81%	5.5	1.2	-77%
Reddit	231443	11606919	6611	493	-93%	154	33	-82%

The old backend has been removed, and conversion from a NetworkX graph should be performed via the StellarGraph.from_networkx function (the existing form StellarGraph(networkx_graph) is supported in this release but is deprecated, and may be removed in a future release).

More detailed information about Heterogeneous GraphSAGE (HinSAGE) has been added to StellarGraph's readthedocs documentation #839.
New algorithms:
- Link prediction with directed GraphSAGE, via DirectedGraphSAGELinkGenerator #871
- GraphWave: computes structural node embeddings by using wavelet transforms on the graph Laplacian #822

Breaking changes

Some layers and models had many parameters move from **kwargs to real arguments: GraphConvolution, GCN. #801 Invalid (e.g. incorrectly spelled) arguments would have been ignored previously, but now may fail with a TypeError; to fix, remove or correct the arguments.
The stellargraph.data.load_dataset_BlogCatalog3 function has been replaced by the load method on stellargraph.datasets.BlogCatalog3 #888. Migration: replace load_dataset_BlogCatalog3(location) with BlogCatalog3().load(); code required to find the location or download the dataset can be removed, as load now does this automatically.
stellargraph.data.train_test_val_split and stellargraph.data.NodeSplitter have been removed. #887 Migration: this functionality should be replaced with pandas and sklearn (for instance, sklearn.model_selection.train_test_split).
Most of the submodules in stellargraph.utils have been moved to top-level modules: stellargraph.calibration, stellargraph.ensemble, stellargraph.losses and stellargraph.interpretability #938. Imports from the old location are now deprecated, and may stop working in future releases. See the linked issue for the full list of changes.

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

Temporal Random Walks: random walks that respect the time that each edge occurred (stored as edge weights) #787. The implementation does not have an example or thorough testing and documentation.
Watch Your Step: computes node embeddings by simulating the effect of random walks, rather than doing them. #750. The implementation is not fully tested.
ComplEx: computes embeddings for nodes and edge types in knowledge graphs, and use these to perform link prediction #756. The implementation hasn't been validated to match the paper.
Neo4j connector: the GraphSAGE algorithm can execute doing neighbourhood sampling in a Neo4j database, so that the edges of a graph do not have to fit entirely into memory #799. The implementation is not automatically tested, and doesn't support functionality like loading node feature vectors from Neo4j.

Bug fixes and other changes

StellarGraph now supports TensorFlow 2.1, which includes GPU support by default: #875
Demos now focus on Jupyter notebooks, and demo scripts that duplicate notebooks have been removed: #889
The following algorithms are now reproducible:
- Supervised GraphSAGE Node Attribute Inference #844
- GraphSAGE Link Prediction #925
Randomness can be more easily controlled using stellargraph.random.set_seed #806
StellarGraph.edges() can return edge weights as a separate NumPy array with include_edge_weights=True #754
StellarGraph.to_networkx supports ignoring node features (and thus being a little more efficient) with feature_name=None #841
StellarGraph.to_adjacency_matrix now ignores edge weights (that is, defaults every weight to 1) by default, unless weighted=True is specified #857
stellargraph.utils.plot_history visualises the model training history as a plot for each metric (such as loss) #902
the saliency maps/interpretability code has been refactored to have more sharing as well as to make it cleaner and easier to extend #855
DevOps changes:
- Most demo notebooks are now tested on CI using Papermill, and so won't become out of date #575
- CI: #698, #760, #788, #817, #860, #874, #877, #878, #906, #908, #915, #916, #918
- Other: #708, #746, #791

0.9.0

Major features and improvements

StellarGraph is now available as a conda package on Anaconda Cloud #516
New algorithms:
- Cluster-GCN: an extension of GCN that can be trained using SGD, with demo #487
- Relational-GCN (RGCN): a generalisation of GCN to relational/multi edge type graphs, with demo #490
- Link prediction for full-batch models: FullBatchLinkGenerator allows doing link prediction with algorithms like GCN, GAT, APPNP and PPNP #543
Unsupervised GraphSAGE has now been updated and tested for reproducibility. Ensuring all seeds are set, running the same pipeline should give reproducible embeddings. #620
A datasets subpackage provides easier access to sample datasets with inbuilt downloading. #690

Breaking changes

The stellargraph library now only supports tensorflow version 2.0 #518, #732. Backward compatibility with earlier versions of tensorflow is not guaranteed.
The stellargraph library now only supports Python versions 3.6 and above #641. Backward compatibility with earlier versions of Python is not guaranteed.
The StellarGraph class no longer exposes NetworkX internals, only required functionality. In particular, calls like list(G) will no longer return a list of nodes; use G.nodes() instead. #297 If NetworkX functionality is required, use the new .to_networkx() method to convert to a normal networkx.MultiGraph or networkx.MultiDiGraph.
Passing a NodeSequence or LinkSequence object to GraphSAGE and HinSAGE classes is now deprecated and no longer supported #498. Users might need to update their calls of GraphSAGE and HinSAGE classes by passing generator objects instead of generator.flow() objects.
Various methods on StellarGraph have been renamed to be more succinct and uniform:
- get_feature_for_nodes is now node_features
- type_for_node is now node_type
Neighbourhood methods in StellarGraph class (neighbors, in_nodes, out_nodes) now return a list of neighbours instead of a set. This addresses #653. This means multi-edges are no longer collapsed into one in the return value. There will be an implicit change in behaviour for explorer classes used for algorithms like GraphSAGE, Node2Vec, since a neighbour connected via multiple edges will now be more likely to be sampled. If this doesn't sound like the desired behaviour, consider pruning the graph of multi-edges before running the algorithm.
GraphSchema has been simplified to remove type look-ups for individual nodes and edges #702 #703. Migration: for nodes, use StellarGraph.node_type; for edges, use the triple argument to the edges method, or filter when doing neighbour queries using the edge_types argument.
NodeAttributeSpecification and the supporting Converter classes have been removed #707. Migration: use the more powerful and flexible preprocessing tools from pandas and sklearn (see the linked PR for specifics)

Experimental features

Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.

The StellarGraph and StellarDiGraph classes supports using a backend based on NumPy and Pandas that uses dramatically less memory for large graphs than the existing NetworkX-based backend #668. The new backend can be enabled by constructing with StellarGraph(nodes=..., edges=...) using Pandas DataFrames, instead of a NetworkX graph.

Bug fixes and other changes

Documentation for every relased version is published under a permanent URL, in addition to the stable alias for the latest release, e.g. https://stellargraph.readthedocs.io/en/v0.8.4/ for v0.8.4 #612
Neighbourhood methods in StellarGraph class (neighbors, in_nodes, out_nodes) now support additional parameters to include edge weights in the results or filter by a set of edge types. #646
Changed GraphSAGE and HinSAGE class API to accept generator objects the same as GCN/GAT models. Passing a NodeSequence or LinkSequence object is now deprecated. #498
SampledBreadthFirstWalk, SampledHeterogeneousBreadthFirstWalk and DirectedBreadthFirstNeighbours have been made 1.2-1.5× faster #628
UniformRandomWalk has been made 2× faster #625
FullBatchNodeGenerator.flow has been reduced from O(n^2) quadratic complexity to O(n), where n is the number of nodes in the graph, making it orders of magnitude faster for large graphs #513
The dependencies required for demos and testing have been included as "extras" in the main package: demos and igraph for demos, and test for testing. For example, pip install stellargraph[demos,igraph] will install the dependencies required to run every demo. #661
The StellarGraph and StellarDiGraph constructors now list their arguments explicitly for clearer documentation (rather than using *arg and **kwargs splats) #659
sys.exit(0) is no longer called on failure in load_dataset_BlogCatalog3 #648
Warnings are printed using the Python warnings module #583
Numerous DevOps changes:
- CI results are now publicly viewable: https://buildkite.com/stellar/stellargraph-public
- CI: #524, #534, #544, #550, #551, #557, #562, #574 #578, #579, #587, #592, #595, #596, #602, #609, #613, #615, #631, #637, #639, #640, #652, #656, #663, #675
- Git and GitHub configuration: #516, #588, #624, #672, #682, #683,
- Other: #523, #582, #590, #654

0.8.4

Fixed bugs:

Fix DirectedGraphSAGENodeGenerator always hitting TypeError exception. #695

0.8.3

Fixed bugs:

Fixed the issue in the APPNP class that causes appnp to propagate excessive dropout layers. #525
Added a fix into the PPNP node classification demo so that the softmax layer is no longer propagated. #525

0.8.2

Fixed bugs:

Updated requirements to TensorFlow>=1.14, as tensorflow with lower versions causes errors with sparse full batch node methods: GCN, APPNP, and GAT. #519

0.8.1

Fixed bugs:

Reverted erroneous demo notebooks.

0.8.0

Full Changelog

New algorithms:

Directed GraphSAGE algorithm (a generalisation of GraphSAGE to directed graphs) + demo #479
Attri2vec algorithm + demo #470 #455
PPNP and APPNP algorithms + demos #485
GAT saliency maps for interpreting node classification with Graph Attention Networks + demo #435

Implemented enhancements:

New demo of node classification on Twitter hateful users \430
New demo of graph saliency on Twitter hateful users #448
Added Directed SampledBFS walks on directed graphs #464
Unified API of GCN, GAT, GraphSAGE, and HinSAGE classses by adding build() method to GCN and GAT classes #439
Added activations argument to GraphSAGE and HinSAGE classes #381
Unified activations for GraphSAGE, HinSAGE, GCN and GAT #493 #381
Added optional regularisation on the weights for GCN, GraphSage, and HinSage #172 #469
Unified regularisation of GraphSAGE, HinSAGE, GCN and GAT #494 (geoffj-d61)
Unsupervised GraphSage speed up via multithreading #474 #477
Support of sparse generators in the GCN saliency map implementation. #432

Refactoring:

Refactored Ensemble class into Ensemble and BaggingEnsemble. The former implements naive ensembles and the latter bagging ensembles. #459
Changed from using keras to use tensorflow.keras #471
Removed flatten_output arguments for all models #447

Fixed bugs:

Updated Yelp example to support new dataset version #442
Fixed bug where some nodes and edges did not get a default type #451
Inconsistency in Ensemble.fit_generator() argument #461
Fixed source--target node designations for code using Cora dataset #444
IndexError: index 1 is out of bounds for axis 1 with size 1 in: demos/node-classification/hinsage #434
GraphSAGE and GAT/GCN predictions have different shapes #425

Removed igraph and mplleaflet from demos requirements in setup.py. Python-igraph doesn't install on many systems and is only required for the clustering notebook. See the README.md in that directory for requirements and installation directions.
Updated GCN interpretability notebook to work with new FullBatchGenerator API #429

0.7.0

Full Changelog

Implemented enhancements:

SGC Implementation #361 (PantelisElinas)
Updated to support Python 3.7 #348
FullBatchNodeGenerator now supports a simpler interface to apply different adjacency matrix preprocessing options #405
Full-batch models (GCN, GAT, and SGC) now return predictions for only those nodes provided to the generator in the same order #417
GAT now supports using a sparse adjacency matrix making execution faster #420
Added interpretability of GCN models and a demo of finding important edges for a node prediction #383
Added a demo showing inductive classification with the PubMed dataset #372

Refactoring:

Added build() method for GraphSAGE and HinSAGE model classes #385 This replaces the node_model() and link_model() methods, which will be deprecated in future versions (deprecation warnings added).
Changed the FullBatchNodeGenerator to accept simpler method and transform arguments #405

Fixed bugs:

Removed label from features for pubmed dataset. #362
Python igraph requirement fixed #392
Simplified random walks to not require passing a graph #408

0.6.1 (1 Apr 2019)

Fixed bugs:

a bug in passing graph adjacency matrix to the optional func_opt function in FullBatchNodeGenerator class
a bug in demos/node-classification/gcn/gcn-cora-example.py:144: incorrect argument was used to pass the optional function to the generator for GCN

Enhancements:

separate treatment of gcn and gat models in demos/ensembles/ensemble-node-classification-example.ipynb

0.6.0 (14 Mar 2019)

Implemented new features and enhancements:

Graph Attention (GAT) layer and model (stack of GAT layers), with demos #216, #315
Unsupervised GraphSAGE #331 with a demo #335
Model Ensembles #343
Community detection based on unsupervised graph representation learning #354
Saliency maps and integrated gradients for model interpretability #345
Shuffling of head nodes/edges in node and link generators at each epoch #298

Fixed bugs:

a bug where seed was not passed to sampler in GraphSAGELinkGenerator constructor #337
UniformRandomMetaPathWalk doesn't update the current node neighbors #340
seed value for link mapper #336

0.5.0 (11 Feb 2019)

Implemented new features and enhancements:

Added model calibration #326
Added GraphConvolution layer, GCN class for a stack of GraphConvolution layers, and FullBatchNodeGenerator class for feeding data into GCN models #318
Added GraphSAGE attention aggregator #317
Added GraphSAGE MaxPoolAggregator and MeanPoolAggregator #278
Added shuffle option to all flow methods for GraphSAGE and HinSAGE generators #328
GraphSAGE and HinSAGE: ensure that a MLP can be created by using zero samples #301
Handle isolated nodes in GraphSAGE #294
Ensure isolated nodes are handled correctly by GraphSAGENodeMapper and GraphSAGELinkMapper #182
EdgeSplitter: introduce a switch for keeping the reduced graph connected #285
Node2vec for weighted graphs #241
Fix edge types in demos #237
Add docstrings to StellarGraphBase class #175
Make L2-normalisation of the final embeddings in GraphSAGE and HinSAGE optional #115
Check/change the GraphSAGE mapper's behaviour for isolated nodes #100
Added GraphSAGE node embedding extraction and visualisation #290

Fixed bugs:

Fixed the bug in running demos when no options given #271
Fixed the bug in LinkSequence that threw an error when no link targets were given #273

Refactoring:

Refactored link inference classes to use edge_embedding_method instead of edge_feature_method #327

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change Log

Bug fixes and other changes

Major features and improvements

Deprecations

Experimental features

Bug fixes and other changes

Major features and improvements

Experimental features

Bug fixes and other changes

Major features and improvements

Breaking changes

Experimental features

Bug fixes and other changes

Major features and improvements

Breaking changes

Experimental features

Bug fixes and other changes

Bug fixes and other changes

Major features and improvements

Breaking changes

Experimental features

Bug fixes and other changes

Major features and improvements

Breaking changes

Experimental features

Bug fixes and other changes

Major features and improvements

Breaking changes

Experimental features

Bug fixes and other changes

0.6.1 (1 Apr 2019)

0.6.0 (14 Mar 2019)

0.5.0 (11 Feb 2019)