diff --git a/README.md b/README.md
index 8680825..da782db 100644
--- a/README.md
+++ b/README.md
@@ -18,80 +18,8 @@ Fast node ranking algorithms on large graphs.
# :hammer_and_wrench: Installation
`pygrank` works with Python 3.9 or later. The latest version can be installed with pip per:
-```
-pip install --upgrade pygrank
-```
-
-To run the library on backpropagateable backends,
-either change the automatically created
-configuration file (follow the instructions in the stderr console)
-or run parts of your code within a
-[context manager](https://book.pythontips.com/en/latest/context_managers.html)
-to override other configurations like this:
-
-```python
-import pygrank as pg
-with pg.Backend("tensorflow"):
- ... # run your pygrank code here
-```
-
-Otherwise, everything runs on top of `numpy`, which
-is faster for forward passes. Node ranking algorithms
-can be defined outside contexts and run inside.
-
-# :zap: Quickstart
-Before looking at details, here is fully functional
-pipeline that scores the importance of a node in relation to
-a list of "seed" nodes within a graph's structure:
-
-```python
-import pygrank as pg
-graph, seeds, node = ...
-
-pre = pg.preprocessor(assume_immutability=True, normalization="symmetric")
-algorithm = pg.PageRank(alpha=0.85)+pre >> pg.Sweep() >> pg.Ordinals()
-ranks = algorithm(graph, seeds)
-print(ranks[node])
-print(algorithm.cite())
-```
-
-The graph can be created with `networkx` or, for faster computations,
-with the `pygrank.fastgraph` module. Nodes can hold any
-kind of object or data type (you don't need to convert them to integers).
-
-The above snippet starts by defining a `preprocessor`,
-which controls how graph adjacency matrices are normalized.
-In this case, a symmetric normalization
-is applied (which is ideal for undirected graphs) and we also
-assume graph immutability, i.e., that it will not change in the future.
-When this assumption is declared, the preprocessor hashes a lot of
-computations to considerably speed up experiments or autotuning.
-
-The snippet uses the [chain operator](docs/basics/functional.md)
-to wrap node ranking algorithms by various kinds of postprocessors.
-You can also put algorithms into each other's constructors
-if you are not a fan of functional programming.
-The chain starts from a pagerank graph filter with diffusion parameter
-0.85. Other filters can be declared, including automatically tuned ones.
-
-The produced algorithm is run as a callable,
-yielding a map between nodes and values
-(in graph signal processing, such maps are called graph signals)
-and the value of a node is printed. Graph signals can
-also be created and directly parsed by algorithms, for example as:
-```
-signal = pg.to_signal(graph, {v: 1. for v in seeds})
-ranks = algorithm(signal)
-```
-
-Finally, the snippet prints a recommended citation for the algorithm.
-
-### More examples
-
-[Showcase](docs/advanced/quickstart.md)
-[Big data FAQ](docs/tips/big.md)
-[Downstream tasks](https://github.com/maniospas/pygrank-downstream)
-
+# :link: Documentation
+**https://pygrank.readthedocs.io**
# :brain: Overview
Analyzing graph edges (links) between graph nodes can help
@@ -123,9 +51,6 @@ Some of the library's advantages are:
5. **Modular** components to be combined and a functional chain interface for complex combinations.
6. **Fast** running time with highly optimized operations
-# :link: Material
-[Tutorials & Documentation](documentation/documentation.md)
-[Functional Interface](docs/basics/functional.md)
# :fire: Features
* Graph filters
diff --git a/docs/advanced/autotuning.md b/docs/advanced/autotuning.md
index e202b32..4109113 100644
--- a/docs/advanced/autotuning.md
+++ b/docs/advanced/autotuning.md
@@ -8,9 +8,13 @@ through a `pygrank.Tuner` base class, which wraps
any kind of node ranking algorithm. Ideally, this would wrap end-product
algorithms.
-:bulb: Tuners differ from benchmarks in that best node ranking algorithms
-can be selected on-the-fly.
+!!! warning
+ Tuners differ from benchmarks in that they select node ranking algorithms
+ on-the-fly based on input data. They may overfit even with train-validation-test splits.
+An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md).
+After initialization with the appropriate
+parameters, these can run with the same pattern as other node ranking algorithms.
Tuner instances with default arguments use commonly seen base settings.
For example, the following code separates training and evaluation
data of a provided personalization signal and then uses a tuner that
@@ -45,11 +49,6 @@ scores_tuned = pg.ParameterTuner(algorithm_from_params,
measure=pg.NDCG).tune(personalization)
```
-An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md).
-After initialization with the appropriate
-parameters, these can be used interchangeably in the above example.
-
-## Tuning speedup
Graph convolutions are the most computationally-intensive operations
node ranking algorithms employ, as their running time scales linearly with the
@@ -58,17 +57,14 @@ aim to optimize algorithms involving graph filters extending the
`ClosedFormGraphFilter` class, graph filtering is decomposed into
weighted sums of naturally occurring
Krylov space base elements {*Mnp*, *n=0,1,...*}.
-
To speed up computation time (by many times in some settings) `pygrank`
provides the ability to save the generation of this Krylov space base
so that future runs do *not* recompute it, effectively removing the need
to perform graph convolutions all but once for each personalization.
-:warning: When applying this speedup outside of tuners, it requires
-explicitly passing a graph signal object to graph filters
-(e.g. it does not work with dictionary inputs) since this is the only
-way to hash both the personalization and the graph
-on one persistent object.
+!!! info
+ This speedup can be applied outside of tuners too;
+ explicitly pass a graph signal object to node ranking algorithms.
To enable this behavior, a dictionary needs to be passed to closed form
graph filter constructors through an `optimization_dict` argument.
@@ -85,18 +81,16 @@ tuner = pg.ParameterTuner(error_type="iters",
scores = tuner(graph, personalization)
```
-:warning: Similarly to the `assume_immutability=True` option
-for preprocessors, this requires that graphs signals are not altered in
-the interim, although it is possible to clear signal values.
-In particular, to remove
-allocated memory, you can keep a reference to the dictionary and clear
-it afterwards with `optimization_dict.clear()`.
-
-:warning: Using optimization dictionaries multiplies (e.g. at least doubles)
-the amount of used memory, which the system may run out of for large graphs.
-
-:bulb: The default algorithms provided by tuners make use of the class
-*pygrank.SelfClearDict* instead of a normal dictionary. This keeps track only
-of the last personalization and only optimizes runs for the last personalization.
-This way optimization becomes fast while allocating the minimum memory required
-for tuning.
+!!! warning
+ Similarly to the `assume_immutability=True` option
+ for preprocessors, the optimization dictionary requires that graphs signals are not altered in
+ the interim, although it is possible to clear signal values.
+ Furthermore, using optimization dictionaries multiplies (e.g. at least doubles)
+ the amount of used memory, which the system may run out of for large graphs.
+ To remove allocated memory, keep a reference to the dictionary and clear
+ it afterwards with `optimization_dict.clear()`.
+
+!!! info
+ The default algorithms constructed by tuners (if none are provided) use
+ *pygrank.SelfClearDict* instead of a normal dictionary. This clears other entries when
+ a new personalization is inserted, therefore avoiding memory bloat.
diff --git a/docs/advanced/convergence.md b/docs/advanced/convergence.md
index 16b04f8..b8886a8 100644
--- a/docs/advanced/convergence.md
+++ b/docs/advanced/convergence.md
@@ -6,9 +6,10 @@ error and tolerance for numerical convergence. If no such argument is passed
to the constructor, a `pygrank.ConvergenceManager` object
is automatically instantiated by borrowing whichever extra arguments it can
from those passed to algorithm constructors. These arguments can be:
-- `tol` to indicate the numerical tolerance level required for convergence (default is 1.E-6).
-- `error_type` to indicate how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](#evaluation) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below).
-- `max_iters` to indicate the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination.
+
+- *tol:* Indicates the numerical tolerance level required for convergence (default is 1.E-6).
+- *error_type:* Indicates how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](../basics/evaluation.md) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below).
+- *max_iters:* Indicates the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination.
Sometimes, it suffices to reach a robust node rank order instead of precise
values. To cover such cases we have implemented a different convergence criterion
@@ -22,11 +23,80 @@ import pygrank as pg
G, personalization = ...
alpha = 0.85
-ordered_ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha))
-ordered_ranker = pg.Ordinals(ordered_ranker)
+ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha))
+ordered_ranker = ranker >> pg.Ordinals()
ordered_ranks = ordered_ranker(G, personalization)
```
-:bulb: Since the node order is more important than the specific rank values,
-a post-processing step has been added throught the wrapping expression
-``ordered_ranker = pg.Ordinals(ordered_ranker)`` to output rank order.
+!!! info
+ Since the node order was deemed more important than the specific rank values,
+ a postprocessing step was added.
+
+
+
+# Demo
+
+As a quick start, let us construct a graph
+and a set of nodes. The graph's class can be
+imported either from the `networkx` library or from
+`pygrank` itself. The two are in large part interoperable
+and both can be parsed by our algorithms.
+But our implementation is tailored to graph signal
+processing needs and thus tends to be faster and consume
+only a fraction of the memory.
+
+```python
+from pygrank import Graph
+
+graph = Graph()
+graph.add_edge("A", "B")
+graph.add_edge("B", "C")
+graph.add_edge("C", "D")
+graph.add_edge("D", "E")
+graph.add_edge("A", "C")
+graph.add_edge("C", "E")
+graph.add_edge("B", "E")
+seeds = {"A", "B"}
+```
+
+We now run a personalized PageRank
+to score the structural relatedness of graph nodes to the ones of the given set.
+First, let us import the library:
+
+```python
+import pygrank as pg
+```
+
+For instructional purposes,
+we experiment with (personalized) *PageRank*
+and make it output the node order of ranks.
+
+```python
+ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto") >> pg.Ordinals()
+ranks = ranker(graph, {v: 1 for v in seeds})
+```
+
+How much time did it take for the base ranker to converge?
+(Depends on backend and device characteristics.)
+
+```python
+print(ranker.convergence)
+# 19 iterations (0.0021852000063518062 sec)
+```
+
+Since for this example only the node order is important,
+we can use a different way to specify convergence:
+
+```python
+convergence = pg.RankOrderConvergenceManager(pagerank_alpha=0.85, confidence=0.98)
+early_stop_ranker = pg.PageRank(alpha=0.85, convergence=convergence) >> pg.Ordinals()
+ordinals = early_stop_ranker(graph, {v: 1 for v in seeds})
+print(early_stop_ranker.convergence)
+# 2 iterations (0.0005241000035312027 sec)
+print(ordinals["B"], ordinals["D"], ordinals["E"])
+# 3.0 5.0 4.0
+```
+
+Close to the previous results at a fraction of the time! For large graphs,
+most ordinals would be near the ideal ones. Note that convergence time
+does not take into account the time needed to preprocess graphs.
diff --git a/docs/advanced/quickstart.md b/docs/advanced/quickstart.md
deleted file mode 100644
index 4d440c7..0000000
--- a/docs/advanced/quickstart.md
+++ /dev/null
@@ -1,126 +0,0 @@
-# Demo
-
-As a quick start, let us construct a graph
-and a set of nodes. The graph's class can be
-imported either from the `networkx` library or from
-`pygrank` itself. The two are in large part interoperable
-and both can be parsed by our algorithms.
-But our implementation is tailored to graph signal
-processing needs and thus tends to be faster and consume
-only a fraction of the memory.
-
-```python
-from pygrank import Graph
-
-graph = Graph()
-graph.add_edge("A", "B")
-graph.add_edge("B", "C")
-graph.add_edge("C", "D")
-graph.add_edge("D", "E")
-graph.add_edge("A", "C")
-graph.add_edge("C", "E")
-graph.add_edge("B", "E")
-seeds = {"A", "B"}
-```
-
-We now run a personalized PageRank [graph filter](documentation/documentation.md#graph-filters)
-to score the structural relatedness of graph nodes to the ones of the given set.
-First, let us import the library:
-
-```python
-import pygrank as pg
-```
-
-For instructional purposes,
-we experiment with (personalized) *PageRank*.
-Instantiation of this and more filters is described [here](../generated/graph_filters.md),
-and can be accessed from the top-level import.
-We also set the default values of some parameters: the graph diffusion
-rate *alpha* required by this particular filter, a numerical tolerance *tol* at the
-convergence point and a graph preprocessing strategy *"auto"* that normalizes
-the graph adjacency matrix in either a column-based or symmetric
-way, depending on whether the graph is undirected (as in this example)
-or not respectively.
-
-```python
-ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto")
-ranks = ranker(graph, {v: 1 for v in seeds})
-```
-
-Node ranking outputs are always organized into
-[graph signals](documentation/documentation.md#graph-signals).
-These can be used like dictionaries for easy access.
-For example, printing the scores of some nodes can be done per:
-
-```python
-print(ranks["B"], ranks["D"], ranks["E"])
-# 0.5173091321819129 0.24969444089457765 0.3415804634807899
-```
-
-We alter this outcome so that it outputs node order,
-where higher node scores are assigned lower order,
-by wrapping a postprocessor around the base algorithm.
-You can find more postprocessors [here](../generated/postprocessors.md),
-including ones to make scores fairness-aware.
-
-```python
-ordinals = pg.Ordinals(ranker).rank(graph, {v: 1 for v in seeds})
-print(ordinals["B"], ordinals["D"], ordinals["E"])
-# 1.0 5.0 4.0
-```
-
-How much time did it take for the base ranker to converge?
-(Depends on backend and device characteristics.)
-
-```python
-print(ranker.convergence)
-# 19 iterations (0.0021852000063518062 sec)
-```
-
-Since for this example only the node order is important,
-we can use a different way to specify convergence:
-
-```python
-convergence = pg.RankOrderConvergenceManager(pagerank_alpha=0.85, confidence=0.98)
-early_stop_ranker = pg.PageRank(alpha=0.85, convergence=convergence)
-ordinals = pg.Ordinals(early_stop_ranker).rank(graph, {v: 1 for v in seeds})
-print(early_stop_ranker.convergence)
-# 2 iterations (0.0005241000035312027 sec)
-print(ordinals["B"], ordinals["D"], ordinals["E"])
-# 3.0 5.0 4.0
-```
-
-Close to the previous results at a fraction of the time!! For large graphs,
-most ordinals would be near the ideal ones. Note that convergence time
-does not take into account the time needed to preprocess graphs.
-
-Till now, we used `PageRank`, but what would happen if we do not know which base
-algorithm to use? In these cases `pygrank` provides online tuning of generalized
-graph signal processing filters on the personalization. The ranker
-in the ranking algorithm construction code can be replaced with an automatically tuned
-equivalent per:
-
-```python
-tuned_ranker = pg.ParameterTuner()
-ordinals = pg.Ordinals(tuned_ranker).rank(graph, {v: 1 for v in seeds})
-print(ordinals["B"], ordinals["D"], ordinals["E"])
-# 2.0 5.0 4.0
-```
-
-This yields similar node ordinals, which means that tuning constructed
-a graph filter similar to `PageRank`.
-Tuning may be worse than highly specialized algorithms in some settings,
-but usually finds near-best base algorithms.
-
-To obtain a recommendation about how to cite complex
-algorithms, an automated description can be extracted
-by the source code per the following
-command:
-
-```python
-print(tuned_ranker.cite())
-# graph filter \cite{ortega2018graph} with dictionary-based hashing \cite{krasanakis2022pygrank}, max normalization and parameters tuned \cite{krasanakis2022autogf} to optimize AUC while withholding 0.100 of nodes for validation
-```
-
-Bibtex entries corresponding to the citations can be found
-[here](../tips/citations.md).
\ No newline at end of file
diff --git a/docs/basics/about.md b/docs/basics/about.md
index 2730490..4172dc3 100644
--- a/docs/basics/about.md
+++ b/docs/basics/about.md
@@ -2,7 +2,7 @@
-At the core of `pygrank` lies the concept of *graph signals*, which map graph nodes to scores.
+At the core of `pygrank` lies the concept of *graph signals*, which map graph nodes to numerical scores.
Supervised and unsupervised measures evaluate the predictive/ranking quality
of graph signals.
@@ -23,3 +23,4 @@ Here is a glossary of common concepts sorted alphabetically:
| Personalization | The graph signal inputted in grap filters. This is also known as graph signal priors or the personalization vector. |
| Seeds | Example nodes that are known to belong to a community. |
| Tuning | A process of determining algorithm hyper-parameters that optimize some evaluation measure. |
+
diff --git a/docs/basics/filters.md b/docs/basics/filters.md
index 5fcd351..dc5e28e 100644
--- a/docs/basics/filters.md
+++ b/docs/basics/filters.md
@@ -12,25 +12,20 @@ applies a graph filter, potentially postprocesses its outcome, and eventually ar
-
-## Calling filters
-
Filters are created based on a constructor that takes as input several keyword
arguments affecting how they work. An exhaustive list of ready-to-use graph filters
and their constructors
-found [here](../generated/filters.md). After its initialization, a filter `alg` can run
+found [here](../generated/filters.md).
+More complicated node ranking algorithms can be obtained by applying postprocessors on
+filters. This is covered in the [next section](postprocessors.md).
+After its initialization, a filter `alg` can run
with one of the following two patterns (these are interchangeable):
* `ranks = alg(graph, personalization)`
* `alg(pg.to_signal(graph, personalization))`
-More complicated node ranking algorithms can be obtained by applying postprocessors on
-filters. This is covered in the [next section](postprocessors.md).
-
-
-## Example
-Let us define an personalized PageRank filter. If the personalization is
+As an example, let us define an personalized PageRank filter. If the personalization is
binary (i.e. all nodes have initial scores either 0 or 1) this algorithm
is equivalent to a stochastic Markov process where it starts from the nodes
with initial scores 1, iteratively jumps to neighbors randomly, and has
diff --git a/docs/basics/postprocessors.md b/docs/basics/postprocessors.md
index 2baab02..d364759 100644
--- a/docs/basics/postprocessors.md
+++ b/docs/basics/postprocessors.md
@@ -2,22 +2,15 @@
Filter outcomes of graph often require additional processing steps, for example to perform
normalization, improve their quality, or apply fairness constraints.
-We refer to the improvement of graph filter outcomes as postprocessing,
-and let node ranking algorithms perform any number of postprocessing steps
-on top of base graph filters.
-
-
-## Wrapping filters
-
-Postprocessors wrap base graph filters to affect their outcome. The resulting
+Postptprocessors wrap base filters to improve their outcomes, and the result
node ranking algorithms are called as if they were still filters. The filters
can either be supplied to construcors, or the postprocessors may be initialized
from the rest of their arguments and applied onto filters afterwards with the
-pattern `algorithm = filter >> postprocessor`.
+functional chain pattern `algorithm = filter >> postprocessor`.
An list of ready-to-use postprocessors can be
-found [here](../generated/postprocessors.md). Simplar ones perform
+found [here](../generated/postprocessors.md). Simpler ones perform
normalization, for example to enforce the maximal or the sum
of node scores to be 1. There also exist thresholding schemes, which can be used
for binary community detection, as well as methods to make node
@@ -30,14 +23,11 @@ by providing more example nodes, and for fairness-aware posteriors,
which aim to make node scores adhere to some fairness constraint,
such as disparate impact.
-
-## Example
-
-Let us consider a simple scenario where we want the graph signal outputted
+Let us consider a simple toy scenario where we want the graph signal outputted
by a filter to always be normalized so that its largest node score is one. For
-this, we consider a graph `G`, signal `signal` and filter `alg`,
-and will use the postprocessor `Normalize`. For convenience, some postprocessors supply a
-`transform` method that can be applied on graph signals like so:
+graph `G`, signal `signal` and filter `alg`,
+and can use the postprocessor `Normalize("max")`. For convenience, simpler postprocessors
+like this one supply a method to transform graph signals like so:
```python
scores = alg(graph, signal)
@@ -46,8 +36,8 @@ print(list(normalized_scores.items()))
# [('A', 1.0), ('B', 0.4950000024069947), ('C', 0.9828783455187619), ('D', 0.9540636897749238), ('E', 0.472261528845582)]
```
-However, the pattern that works for **all** postprocessors
-is to wrap base algorithms, like in the following example:
+The pattern that works for **all** postprocessors
+is to wrap base algorithms, like in the following equivalent example:
```python
@@ -57,13 +47,12 @@ print(nscores)
# [('A', 1.0), ('B', 0.4950000024069947), ('C', 0.9828783455187619), ('D', 0.9540636897749238), ('E', 0.472261528845582)]
```
-We now apply more steps to the algorithm by performing
-an element-wise exponential transformation of node scores
-with the postprocessor `Transformer` *before* normalization
-can be achieved as:
+We can add more steps, such as
+an element-wise exponential transformation of scores
+before normalization:
```python
-nealg = alg >> pg.Transformer(pp.exp) >> pg.Normalize("max")
+nealg = alg >> pg.Transformer(pf.exp) >> pg.Normalize("max")
nescores = nealg(graph, signal)
print(nescores)
# [('A', 1.0), ('B', 0.8786683440755908), ('C', 0.9956241609824301), ('D', 0.9883030876536782), ('E', 0.8735657648099558)]
diff --git a/docs/basics/quickstart.md b/docs/basics/quickstart.md
new file mode 100644
index 0000000..b18f4f9
--- /dev/null
+++ b/docs/basics/quickstart.md
@@ -0,0 +1,36 @@
+# Quickstart
+
+1. Install the library with `pip install pygrank`, import it,
+and construct a node ranking algorithm
+(incrementally apply postprocessors with `>>`).
+There are many components and parameters to find
+good configurations; [autotuning](advanced/autotuning.md) may be helpful.
+
+```python
+import pygrank as pg
+
+hk5 = pg.HeatKernel(t=5, normalization="symmetric", renormalize=True) # a graph filter
+hk5_advanced = hk5 >> pg.SeedOversampling() >> pg.Sweep() >> pg.Normalize("max")
+```
+
+2. Automatically load a graph and a community of nodes with some shared attribute.
+You can also use a `networkx` graph.
+Then run the algorithm to get a graph signal that maps nodes to scores, where scores indicate
+structural proximity to community members.
+
+```python
+_, graph, community = next(pg.load_datasets_one_community(["EUCore"]))
+personalization = {node: 1.0 for node in community} # binary or stochastic membership, missing scores are zero
+
+scores = hk5_advanced(graph, personalization) # returns a dict-like pg.GraphSignal
+print(scores) # {'0': 0.3154503251398683, '1': 0.26661671252340463, '2': 0.03700150026429704, ... }
+```
+
+3. Evaluate scores; here we use a stochastic generalization of the unsupervised Conductance measure (that
+can parse scores).
+
+```python
+measure = pg.Conductance() # an evaluation measure
+pg.benchmark_print_line("My conductance", measure(scores)) # pretty
+print("Cite this algorithm as:", hk5_advanced.cite())
+```
diff --git a/docs/index.md b/docs/index.md
index d5b4f1c..bfb02c6 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,58 +1,46 @@
-# pygrank
+# pygrank
+**Fast node ranking algorithms on large graphs.**
-Fast node ranking algorithms on large graphs.
+
Graph signals work as both dictionaries and arrays. + They are easy to create and run on efficient backends.
+Fast processing of big graphs; sparse data structures, + extensive caching, and scalable algorithms.
+Combine graph filters with multiple postprocessor components + through a seamless pipeline.
+Run benchmarks, or autotune algorithms + as they run based on supervised or unsupervised measures.
+