updated docs

MKLab-ITI · Jun 6, 2024 · dc10db2 · dc10db2
1 parent 9e283aa
commit dc10db2
Show file tree

Hide file tree

Showing 14 changed files with 424 additions and 344 deletions.
diff --git a/README.md b/README.md
@@ -18,80 +18,8 @@ Fast node ranking algorithms on large graphs.
 # :hammer_and_wrench: Installation
 `pygrank` works with Python 3.9 or later. The latest version can be installed with pip per:
 
-```
-pip install --upgrade pygrank
-```
-
-To run the library on backpropagateable backends, 
-either change the automatically created
-configuration file (follow the instructions in the stderr console)
-or run parts of your code within a
-[context manager](https://book.pythontips.com/en/latest/context_managers.html)
-to override other configurations like this:
-
-```python
-import pygrank as pg
-with pg.Backend("tensorflow"):
-    ... # run your pygrank code here
-```
-
-Otherwise, everything runs on top of `numpy`, which
-is faster for forward passes. Node ranking algorithms 
-can be defined outside contexts and run inside.
-
-# :zap: Quickstart
-Before looking at details, here is fully functional
-pipeline that scores the importance of a node in relation to 
-a list of "seed" nodes within a graph's structure:
-
-```python
-import pygrank as pg
-graph, seeds, node = ...
-
-pre = pg.preprocessor(assume_immutability=True, normalization="symmetric")
-algorithm = pg.PageRank(alpha=0.85)+pre >> pg.Sweep() >> pg.Ordinals()
-ranks = algorithm(graph, seeds)
-print(ranks[node])
-print(algorithm.cite())
-```
-
-The graph can be created with `networkx` or, for faster computations,
-with the `pygrank.fastgraph` module. Nodes can hold any 
-kind of object or data type (you don't need to convert them to integers).
-
-The above snippet starts by defining a `preprocessor`, 
-which controls how graph adjacency matrices are normalized.
-In this case, a symmetric normalization
-is applied (which is ideal for undirected graphs) and we also
-assume graph immutability, i.e., that it will not change in the future.
-When this assumption is declared, the preprocessor hashes a lot of
-computations to considerably speed up experiments or autotuning.
-
-The snippet uses the [chain operator](docs/basics/functional.md)
-to wrap node ranking algorithms by various kinds of postprocessors.
-You can also put algorithms into each other's constructors
-if you are not a fan of functional programming.
-The chain starts from a pagerank graph filter with diffusion parameter
-0.85. Other filters can be declared, including automatically tuned ones.
-
-The produced algorithm is run as a callable,
-yielding a map between nodes and values 
-(in graph signal processing, such maps are called graph signals)
-and the value of a node is printed. Graph signals can
-also be created and directly parsed by algorithms, for example as:
-```
-signal = pg.to_signal(graph, {v: 1. for v in seeds})
-ranks = algorithm(signal)
-```
-
-Finally, the snippet prints a recommended citation for the algorithm.
-
-### More examples
-
-[Showcase](docs/advanced/quickstart.md) <br>
-[Big data FAQ](docs/tips/big.md) <br>
-[Downstream tasks](https://github.com/maniospas/pygrank-downstream) <br>
-
+# :link: Documentation
+**https://pygrank.readthedocs.io**
 
 # :brain: Overview
 Analyzing graph edges (links) between graph nodes can help 
@@ -123,9 +51,6 @@ Some of the library's advantages are:
 5. **Modular** components to be combined and a functional chain interface for complex combinations.
 6. **Fast** running time with highly optimized operations
 
-# :link: Material
-[Tutorials & Documentation](documentation/documentation.md) <br>
-[Functional Interface](docs/basics/functional.md)
 
 # :fire: Features
 * Graph filters

diff --git a/docs/advanced/autotuning.md b/docs/advanced/autotuning.md
@@ -8,9 +8,13 @@ through a `pygrank.Tuner` base class, which wraps
 any kind of node ranking algorithm. Ideally, this would wrap end-product
 algorithms.
 
-:bulb: Tuners differ from benchmarks in that best node ranking algorithms
-can be selected on-the-fly.
+!!! warning
+    Tuners differ from benchmarks in that they select node ranking algorithms
+    on-the-fly based on input data. They may overfit even with train-validation-test splits.
 
+An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md).
+After initialization with the appropriate
+parameters, these can run with the same pattern as other node ranking algorithms.
 Tuner instances with default arguments use commonly seen base settings.
 For example, the following code separates training and evaluation
 data of a provided personalization signal and then uses a tuner that
@@ -45,11 +49,6 @@ scores_tuned = pg.ParameterTuner(algorithm_from_params,
                                      measure=pg.NDCG).tune(personalization)
 ```
 
-An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md).
-After initialization with the appropriate
-parameters, these can be used interchangeably in the above example.
-
-## Tuning speedup
 
 Graph convolutions are the most computationally-intensive operations
 node ranking algorithms employ, as their running time scales linearly with the 
@@ -58,17 +57,14 @@ aim to optimize algorithms involving graph filters extending the
 `ClosedFormGraphFilter` class, graph filtering is decomposed into 
 weighted sums of naturally occurring
 Krylov space base elements {*M<sup>n</sup>p*, *n=0,1,...*}.
-
 To speed up computation time (by many times in some settings) `pygrank`
 provides the ability to save the generation of this Krylov space base
 so that future runs do *not* recompute it, effectively removing the need
 to perform graph convolutions all but once for each personalization.
 
-:warning: When applying this speedup outside of tuners, it requires
-explicitly passing a graph signal object to graph filters
-(e.g. it does not work with dictionary inputs) since this is the only
-way to hash both the personalization and the graph
-on one persistent object.
+!!! info
+    This speedup can be applied outside of tuners too;
+    explicitly pass a graph signal object to node ranking algorithms.
 
 To enable this behavior, a dictionary needs to be passed to closed form
 graph filter constructors through an `optimization_dict` argument.
@@ -85,18 +81,16 @@ tuner = pg.ParameterTuner(error_type="iters",
 scores = tuner(graph, personalization)
 ```
 
-:warning: Similarly to the `assume_immutability=True` option
-for preprocessors, this requires that graphs signals are not altered in
-the interim, although it is possible to clear signal values.
-In particular, to remove
-allocated memory, you can keep a reference to the dictionary and clear
-it afterwards with `optimization_dict.clear()`.
-
-:warning: Using optimization dictionaries multiplies (e.g. at least doubles)
-the amount of used memory, which the system may run out of for large graphs.
-
-:bulb: The default algorithms provided by tuners make use of the class
-*pygrank.SelfClearDict* instead of a normal dictionary. This keeps track only
-of the last personalization and only optimizes runs for the last personalization.
-This way optimization becomes fast while allocating the minimum memory required
-for tuning.
+!!! warning
+    Similarly to the `assume_immutability=True` option
+    for preprocessors, the optimization dictionary requires that graphs signals are not altered in
+    the interim, although it is possible to clear signal values.
+    Furthermore, using optimization dictionaries multiplies (e.g. at least doubles)
+    the amount of used memory, which the system may run out of for large graphs.
+    To remove allocated memory, keep a reference to the dictionary and clear
+    it afterwards with `optimization_dict.clear()`.
+
+!!! info
+    The default algorithms constructed by tuners (if none are provided) use
+    *pygrank.SelfClearDict* instead of a normal dictionary. This clears other entries when
+    a new personalization is inserted, therefore avoiding memory bloat.
diff --git a/docs/advanced/convergence.md b/docs/advanced/convergence.md
@@ -6,9 +6,10 @@ error and tolerance for numerical convergence. If no such argument is passed
 to the constructor, a `pygrank.ConvergenceManager` object
 is automatically instantiated by borrowing whichever extra arguments it can
 from those passed to algorithm constructors. These arguments can be:
-- `tol` to indicate the numerical tolerance level required for convergence (default is 1.E-6).
-- `error_type` to indicate how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](#evaluation) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below).
-- `max_iters` to indicate the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination. 
+
+- *tol:* Indicates the numerical tolerance level required for convergence (default is 1.E-6).
+- *error_type:* Indicates how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](../basics/evaluation.md) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below).
+- *max_iters:* Indicates the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination. 
 
 Sometimes, it suffices to reach a robust node rank order instead of precise 
 values. To cover such cases we have implemented a different convergence criterion
@@ -22,11 +23,80 @@ import pygrank as pg
 
 G, personalization = ...
 alpha = 0.85
-ordered_ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha))
-ordered_ranker = pg.Ordinals(ordered_ranker)
+ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha))
+ordered_ranker = ranker >> pg.Ordinals()
 ordered_ranks = ordered_ranker(G, personalization)
 ```
 
-:bulb: Since the node order is more important than the specific rank values,
-a post-processing step has been added throught the wrapping expression
-``ordered_ranker = pg.Ordinals(ordered_ranker)`` to output rank order. 
+!!! info
+    Since the node order was deemed more important than the specific rank values,
+    a postprocessing step was added. 
+
+
+
+# Demo
+
+As a quick start, let us construct a graph 
+and a set of nodes. The graph's class can be
+imported either from the `networkx` library or from
+`pygrank` itself. The two are in large part interoperable
+and both can be parsed by our algorithms.
+But our implementation is tailored to graph signal
+processing needs and thus tends to be faster and consume
+only a fraction of the memory.
+
+```python
+from pygrank import Graph
+
+graph = Graph()
+graph.add_edge("A", "B")
+graph.add_edge("B", "C")
+graph.add_edge("C", "D")
+graph.add_edge("D", "E")
+graph.add_edge("A", "C")
+graph.add_edge("C", "E")
+graph.add_edge("B", "E")
+seeds = {"A", "B"}
+```
+
+We now run a personalized PageRank
+to score the structural relatedness of graph nodes to the ones of the given set.
+First, let us import the library:
+
+```python
+import pygrank as pg
+```
+
+For instructional purposes,
+we experiment with (personalized) *PageRank*
+and make it output the node order of ranks.
+
+```python
+ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto") >> pg.Ordinals()
+ranks = ranker(graph, {v: 1 for v in seeds})
+```
+
+How much time did it take for the base ranker to converge?
+(Depends on backend and device characteristics.)
+
+```python
+print(ranker.convergence)
+# 19 iterations (0.0021852000063518062 sec)
+```
+
+Since for this example only the node order is important,
+we can use a different way to specify convergence:
+
+```python
+convergence = pg.RankOrderConvergenceManager(pagerank_alpha=0.85, confidence=0.98) 
+early_stop_ranker = pg.PageRank(alpha=0.85, convergence=convergence) >> pg.Ordinals()
+ordinals = early_stop_ranker(graph, {v: 1 for v in seeds})
+print(early_stop_ranker.convergence)
+# 2 iterations (0.0005241000035312027 sec)
+print(ordinals["B"], ordinals["D"], ordinals["E"])
+# 3.0 5.0 4.0
+```
+
+Close to the previous results at a fraction of the time! For large graphs,
+most ordinals would be near the ideal ones. Note that convergence time 
+does not take into account the time needed to preprocess graphs.