detailed documentation examples

MKLab-ITI · Jun 6, 2024 · 1cb8fa7 · 1cb8fa7
1 parent dc10db2
commit 1cb8fa7
Show file tree

Hide file tree

Showing 14 changed files with 393 additions and 219 deletions.
diff --git a/docs/advanced/autotuning.md b/docs/advanced/autotuning.md
@@ -8,47 +8,94 @@ through a `pygrank.Tuner` base class, which wraps
 any kind of node ranking algorithm. Ideally, this would wrap end-product
 algorithms.
 
-!!! warning
+!!! info
     Tuners differ from benchmarks in that they select node ranking algorithms
-    on-the-fly based on input data. They may overfit even with train-validation-test splits.
+    on-the-fly based on the graph signal input.
+
+## Getting started
 
 An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md).
-After initialization with the appropriate
-parameters, these can run with the same pattern as other node ranking algorithms.
-Tuner instances with default arguments use commonly seen base settings.
+After initialization, these can run with the same pattern as other node ranking algorithms.
+Tuner instances with default arguments use common base settings.
 For example, the following code separates training and evaluation
-data of a provided personalization signal and then uses a tuner that
+data for a provided personalization signal and then uses a tuner that
 by default creates a `GenericGraphFilter` instance with ten parameters.
 
 ```python
 import pygrank as pg
-graph, personalization = ...
-training, evaluation = pg.split(pg.to_signal(graph, personalization, training_samples=0.5))
-scores_pagerank = pg.PageRank()(graph, training)
-scores_tuned = pg.ParameterTuner()(graph, training)
-auc_pagerank = pg.AUC(evaluation, exclude=training).evaluate(scores_pagerank)
-auc_tuned = pg.AUC(evaluation, exclude=training).evaluate(scores_tuned)
-assert auc_pagerank <= auc_tuned
-# True
+
+_, graph, group = pg.load_one("eucore")
+signal = pg.to_signal(graph, group)
+
+train, test = pg.split(signal, training_samples=0.5)
+
+scores_pagerank = pg.PageRank(max_iters=1000)(train)
+scores_tuned = pg.ParameterTuner()(train)
+
+measure = pg.AUC(test, exclude=train)
+pg.benchmark_print_line("Pagerank", measure(scores_pagerank))
+pg.benchmark_print_line("Tuned", measure(scores_tuned))
+# Pagerank       	 .83
+# Tuned          	 .91
 ```
 
-Specific algorithms can also be tuned on specific parameter values, given
-a method to instantiate the algorithm from a given set of parameters
-(at worst, a lambda expression). For example, the following code defines and runs
-a tuner with the same training personalization of the
-previous example. The tuner finds the optimal alpha value of personalized
-PageRank that optimizes NDCG (tuners optimize AUC be default if no measure is provided).
+Instead of repeating the whole optimization
+process each time a tuner runs, you may
+want to tune once and use the created node ranking
+algorithm later. This can be achieved with the following pattern:
 
 ```python
-import pygrank as pg
-graph, personalization = ...
-algorithm_from_params = lambda params: pg.PageRank(alpha=params[0])
-scores_tuned = pg.ParameterTuner(algorithm_from_params, 
+algorithm_tuned = pg.ParameterTuner().tune(training)
+scores_tuned = algorithm_tuned(training)
+```
+
+## Customization
+
+Tune your algorithms by passing to the `ParameterTuner` 
+a method (or lambda expression) that constructs them 
+given a list of parameters. Also provide corresponding
+upper and lower bounds for the parameters.
+An example follows:
+
+```python
+def custom_algorithm(params): 
+    assert len(params) == 1
+    return pg.PageRank(alpha=params[0])
+
+algorithm = pg.ParameterTuner(custom_algorithm, 
                                      max_vals=[0.99], 
                                      min_vals=[0.5],
-                                     measure=pg.NDCG).tune(personalization)
+                                     measure=pg.NDCG)
+```
+
+
+In the above snippet, we used the NDCG as the measure of choice for tuning.
+If no measure is provided, AUC is the default. If the application calls
+for it and you want to create a measure that is tied to a specific graph signal
+with the `as_supervised_method` like below, set *fraction_of_training=1* for the tuner. This
+forces the tuner to use the whole personalization to produce node ranks internally, 
+since we perform the validation split a priori. 
+
+```python
+import pygrank as pg
+
+_, graph, group = pg.load_one("eucore")
+signal = pg.to_signal(graph, group)
+
+train, test = pg.split(signal, training_samples=0.5)
+train, valid = pg.split(train, training_samples=0.5)
+
+tuner = pg.ParameterTuner(lambda params: pg.PageRank(alpha=params[0]),
+                             max_vals=[0.99],
+                             min_vals=[0.5],
+                             fraction_of_training=1,
+                             measure=pg.NDCG(valid, exclude=train+test).as_supervised_method())
+
+scores_pagerank = pg.PageRank(max_iters=1000)(train)
+scores_tuned = tuner(train)
 ```
 
+## Optimizations
 
 Graph convolutions are the most computationally-intensive operations
 node ranking algorithms employ, as their running time scales linearly with the 

diff --git a/docs/advanced/graph_preprocessing.md b/docs/advanced/graph_preprocessing.md
@@ -5,50 +5,63 @@ that performs symmetric (i.e. Laplacian-like) normalization
 for undirected graphs and column-wise normalization that
 follows a true probabilistic formulation of transition probabilities
 for directed graphs, such as `DiGraph` instances. The type of
-normalization can be specified by passing a `normalization`
-argument to constructors of ranking algorithms. This parameter can 
-assume values of:
-* *"auto"* for the above-described default behavior
-* *"col"* for column-wise normalization
-* *"symmetric"* for symmetric normalization
-* *"none"* for avoiding any normalization, for example because edge weights already hold the normalization.
-
-In combination to the above types of normalization, ranking
-algorithms can be made to perform the renormalization trick
-often employed by graph neural networks,
-which shrinks their spectrum by adding self-loops to nodes
-before extracting the adjacency matrix and its normalization.
-To enable this behavior, you can use `renormalization=True`
-alongside any other `normalization` argument.
+normalization can be specified by passing a *normalization*
+argument to constructors of ranking algorithms. This parameter 
+can have the following values:
+
+| Normalization | Description                                                                                                                                                                                     |
+|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `"auto"`      | The above-described default behavior.                                                                                                                                                           |
+| `"col"`       | Column-wise normalization.                                                                                                                                                                      |
+| `"symmetric"` | Symmetric normalization.                                                                                                                                                                        |
+| `"none"`      | (A string with text "none".) Avoids any normalization, for example, because edge weights already hold the normalization.                                                                        |
+| callable      | A callable applied to a `scipy` sparse adjacency matrix of the "numpy" backend (irrespective of the actually active backend). When applied, it ignores the preprocessor's *reduction* argument. |
+
+Additionally, a *renormalization* argument may be provided
+to add a multiple of the unit matrix to the adjacency matrix,
+a concept called the renormalization trick.
+This by default 0, but can help shrink the spectrum.
+Furthermore, a *transform_adjacency* method can be provided
+to modify the final adjacency matrix. For example,
+you can use these arguments to use the Laplacian matrix
+instead of the adjacency for an algorithm class:
+
+```python
+alg = Algorithm(transform_adjacency=lambda x:-x, renormalization=-1)
+```
+
 
 In all cases, adjacency matrix normalization involves the
 computationally intensive operation of converting the graph 
-into a scipy sparse matrix each time  the `rank(G, personalization)`
-method of ranking algorithms is called. The `pygrank` package
+into a scipy sparse matrix each time node
+ranking algorithms are called. `pygrank`
 provides a way to avoid recomputing the normalization
 during large-scale experiments by the same algorithm for 
 the same graphs by passing an argument `assume_immutability=True`
 to the algorithms's constructor, which indicates that
 the the graph does not change between runs of the algorithm
 and hence computes the normalization only once for each given
-graph, a process known as hashing.
-
-:warning: Hashing only uses the Python object's hash method, 
+graph, a process known as hashing. 
+Hashing only uses the Python object's hash method, 
 so a different instance of the same graph will recompute the 
 normalization if it points at a different memory location.
 
-:warning: Do not alter graph objects after passing them to
-`rank(...)` methods of algorithms with
-`assume_immutability=True` for the first time. If altering the
-graph is necessary midway through your code, create a copy
-instance with one of *networkx*'s in-built methods and
-edit that one.
+
+!!! warning
+    Do not alter graph objects after passing them to
+    `rank(...)` methods of algorithms with
+    `assume_immutability=True` for the first time. If altering the
+    graph is necessary midway through your code, create a copy
+    instance with one of *networkx*'s in-built methods and
+    edit that one.
 
 For example, hashing the outcome of graph normalization to
 speed up multiple calls to the same graph can be achieved
 as per the following code:
+
 ```python
 import pygrank as pg
+
 graph, personalization1, personalization2 = ...
 algorithm = pg.PageRank(alpha=0.85, normalization="col", assume_immutability=True)
 ranks1 = algorithm(graph, personalization1)
@@ -82,6 +95,7 @@ to speed up multiple rank calls to the same graph by
 different ranking algorithms can be done as:
 ```python
 import pygrank as pg
+
 graph, personalization1, personalization2 = ...
 pre = pg.preprocessor(normalization="col", assume_immutability=True)
 algorithm1 = pg.PageRank(alpha=0.85, preprocessor=pre)
@@ -90,7 +104,8 @@ ranks1 = algorithm1(graph, personalization1)
 ranks2 = algorithm2(graph, personalization2) # does not re-compute the normalization
 ```
 
-:bulb: When benchmarking, in the above code you can call `pre(graph)`
-before the first `rank(...)` call to make sure that that call
-does not also perform the first normalization whose outcome will
-be hashed and immediately retrieved by subsequent calls.
+!!! info
+    When benchmarking in the above code you can call `pre(graph)`
+    before the first `rank(...)` call to make sure that that call
+    does not also perform the first normalization whose outcome will
+    be hashed and immediately retrieved by subsequent calls.