diff --git a/README.md b/README.md index 8680825..da782db 100644 --- a/README.md +++ b/README.md @@ -18,80 +18,8 @@ Fast node ranking algorithms on large graphs. # :hammer_and_wrench: Installation `pygrank` works with Python 3.9 or later. The latest version can be installed with pip per: -``` -pip install --upgrade pygrank -``` - -To run the library on backpropagateable backends, -either change the automatically created -configuration file (follow the instructions in the stderr console) -or run parts of your code within a -[context manager](https://book.pythontips.com/en/latest/context_managers.html) -to override other configurations like this: - -```python -import pygrank as pg -with pg.Backend("tensorflow"): - ... # run your pygrank code here -``` - -Otherwise, everything runs on top of `numpy`, which -is faster for forward passes. Node ranking algorithms -can be defined outside contexts and run inside. - -# :zap: Quickstart -Before looking at details, here is fully functional -pipeline that scores the importance of a node in relation to -a list of "seed" nodes within a graph's structure: - -```python -import pygrank as pg -graph, seeds, node = ... - -pre = pg.preprocessor(assume_immutability=True, normalization="symmetric") -algorithm = pg.PageRank(alpha=0.85)+pre >> pg.Sweep() >> pg.Ordinals() -ranks = algorithm(graph, seeds) -print(ranks[node]) -print(algorithm.cite()) -``` - -The graph can be created with `networkx` or, for faster computations, -with the `pygrank.fastgraph` module. Nodes can hold any -kind of object or data type (you don't need to convert them to integers). - -The above snippet starts by defining a `preprocessor`, -which controls how graph adjacency matrices are normalized. -In this case, a symmetric normalization -is applied (which is ideal for undirected graphs) and we also -assume graph immutability, i.e., that it will not change in the future. -When this assumption is declared, the preprocessor hashes a lot of -computations to considerably speed up experiments or autotuning. - -The snippet uses the [chain operator](docs/basics/functional.md) -to wrap node ranking algorithms by various kinds of postprocessors. -You can also put algorithms into each other's constructors -if you are not a fan of functional programming. -The chain starts from a pagerank graph filter with diffusion parameter -0.85. Other filters can be declared, including automatically tuned ones. - -The produced algorithm is run as a callable, -yielding a map between nodes and values -(in graph signal processing, such maps are called graph signals) -and the value of a node is printed. Graph signals can -also be created and directly parsed by algorithms, for example as: -``` -signal = pg.to_signal(graph, {v: 1. for v in seeds}) -ranks = algorithm(signal) -``` - -Finally, the snippet prints a recommended citation for the algorithm. - -### More examples - -[Showcase](docs/advanced/quickstart.md)
-[Big data FAQ](docs/tips/big.md)
-[Downstream tasks](https://github.com/maniospas/pygrank-downstream)
- +# :link: Documentation +**https://pygrank.readthedocs.io** # :brain: Overview Analyzing graph edges (links) between graph nodes can help @@ -123,9 +51,6 @@ Some of the library's advantages are: 5. **Modular** components to be combined and a functional chain interface for complex combinations. 6. **Fast** running time with highly optimized operations -# :link: Material -[Tutorials & Documentation](documentation/documentation.md)
-[Functional Interface](docs/basics/functional.md) # :fire: Features * Graph filters diff --git a/docs/advanced/autotuning.md b/docs/advanced/autotuning.md index e202b32..4109113 100644 --- a/docs/advanced/autotuning.md +++ b/docs/advanced/autotuning.md @@ -8,9 +8,13 @@ through a `pygrank.Tuner` base class, which wraps any kind of node ranking algorithm. Ideally, this would wrap end-product algorithms. -:bulb: Tuners differ from benchmarks in that best node ranking algorithms -can be selected on-the-fly. +!!! warning + Tuners differ from benchmarks in that they select node ranking algorithms + on-the-fly based on input data. They may overfit even with train-validation-test splits. +An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md). +After initialization with the appropriate +parameters, these can run with the same pattern as other node ranking algorithms. Tuner instances with default arguments use commonly seen base settings. For example, the following code separates training and evaluation data of a provided personalization signal and then uses a tuner that @@ -45,11 +49,6 @@ scores_tuned = pg.ParameterTuner(algorithm_from_params, measure=pg.NDCG).tune(personalization) ``` -An exhaustive list of ready-to-use tuners can be found [here](../generated/tuners.md). -After initialization with the appropriate -parameters, these can be used interchangeably in the above example. - -## Tuning speedup Graph convolutions are the most computationally-intensive operations node ranking algorithms employ, as their running time scales linearly with the @@ -58,17 +57,14 @@ aim to optimize algorithms involving graph filters extending the `ClosedFormGraphFilter` class, graph filtering is decomposed into weighted sums of naturally occurring Krylov space base elements {*Mnp*, *n=0,1,...*}. - To speed up computation time (by many times in some settings) `pygrank` provides the ability to save the generation of this Krylov space base so that future runs do *not* recompute it, effectively removing the need to perform graph convolutions all but once for each personalization. -:warning: When applying this speedup outside of tuners, it requires -explicitly passing a graph signal object to graph filters -(e.g. it does not work with dictionary inputs) since this is the only -way to hash both the personalization and the graph -on one persistent object. +!!! info + This speedup can be applied outside of tuners too; + explicitly pass a graph signal object to node ranking algorithms. To enable this behavior, a dictionary needs to be passed to closed form graph filter constructors through an `optimization_dict` argument. @@ -85,18 +81,16 @@ tuner = pg.ParameterTuner(error_type="iters", scores = tuner(graph, personalization) ``` -:warning: Similarly to the `assume_immutability=True` option -for preprocessors, this requires that graphs signals are not altered in -the interim, although it is possible to clear signal values. -In particular, to remove -allocated memory, you can keep a reference to the dictionary and clear -it afterwards with `optimization_dict.clear()`. - -:warning: Using optimization dictionaries multiplies (e.g. at least doubles) -the amount of used memory, which the system may run out of for large graphs. - -:bulb: The default algorithms provided by tuners make use of the class -*pygrank.SelfClearDict* instead of a normal dictionary. This keeps track only -of the last personalization and only optimizes runs for the last personalization. -This way optimization becomes fast while allocating the minimum memory required -for tuning. +!!! warning + Similarly to the `assume_immutability=True` option + for preprocessors, the optimization dictionary requires that graphs signals are not altered in + the interim, although it is possible to clear signal values. + Furthermore, using optimization dictionaries multiplies (e.g. at least doubles) + the amount of used memory, which the system may run out of for large graphs. + To remove allocated memory, keep a reference to the dictionary and clear + it afterwards with `optimization_dict.clear()`. + +!!! info + The default algorithms constructed by tuners (if none are provided) use + *pygrank.SelfClearDict* instead of a normal dictionary. This clears other entries when + a new personalization is inserted, therefore avoiding memory bloat. diff --git a/docs/advanced/convergence.md b/docs/advanced/convergence.md index 16b04f8..b8886a8 100644 --- a/docs/advanced/convergence.md +++ b/docs/advanced/convergence.md @@ -6,9 +6,10 @@ error and tolerance for numerical convergence. If no such argument is passed to the constructor, a `pygrank.ConvergenceManager` object is automatically instantiated by borrowing whichever extra arguments it can from those passed to algorithm constructors. These arguments can be: -- `tol` to indicate the numerical tolerance level required for convergence (default is 1.E-6). -- `error_type` to indicate how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](#evaluation) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below). -- `max_iters` to indicate the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination. + +- *tol:* Indicates the numerical tolerance level required for convergence (default is 1.E-6). +- *error_type:* Indicates how differences between two graph signals are computed. The default value is `pygrank.Mabs` but any other supervised [measure](../basics/evaluation.md) that computes the differences between consecutive iterations can be used. The string "iters" can also be used to make the algorithm stop only when max_iters are reached (see below). +- *max_iters:* Indicates the maximum number of iterations the algorithm can run for (default is 100). This quantity works as a safety net to guarantee algorithm termination. Sometimes, it suffices to reach a robust node rank order instead of precise values. To cover such cases we have implemented a different convergence criterion @@ -22,11 +23,80 @@ import pygrank as pg G, personalization = ... alpha = 0.85 -ordered_ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha)) -ordered_ranker = pg.Ordinals(ordered_ranker) +ranker = pg.PageRank(alpha=alpha, convergence=pg.RankOrderConvergenceManager(alpha)) +ordered_ranker = ranker >> pg.Ordinals() ordered_ranks = ordered_ranker(G, personalization) ``` -:bulb: Since the node order is more important than the specific rank values, -a post-processing step has been added throught the wrapping expression -``ordered_ranker = pg.Ordinals(ordered_ranker)`` to output rank order. +!!! info + Since the node order was deemed more important than the specific rank values, + a postprocessing step was added. + + + +# Demo + +As a quick start, let us construct a graph +and a set of nodes. The graph's class can be +imported either from the `networkx` library or from +`pygrank` itself. The two are in large part interoperable +and both can be parsed by our algorithms. +But our implementation is tailored to graph signal +processing needs and thus tends to be faster and consume +only a fraction of the memory. + +```python +from pygrank import Graph + +graph = Graph() +graph.add_edge("A", "B") +graph.add_edge("B", "C") +graph.add_edge("C", "D") +graph.add_edge("D", "E") +graph.add_edge("A", "C") +graph.add_edge("C", "E") +graph.add_edge("B", "E") +seeds = {"A", "B"} +``` + +We now run a personalized PageRank +to score the structural relatedness of graph nodes to the ones of the given set. +First, let us import the library: + +```python +import pygrank as pg +``` + +For instructional purposes, +we experiment with (personalized) *PageRank* +and make it output the node order of ranks. + +```python +ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto") >> pg.Ordinals() +ranks = ranker(graph, {v: 1 for v in seeds}) +``` + +How much time did it take for the base ranker to converge? +(Depends on backend and device characteristics.) + +```python +print(ranker.convergence) +# 19 iterations (0.0021852000063518062 sec) +``` + +Since for this example only the node order is important, +we can use a different way to specify convergence: + +```python +convergence = pg.RankOrderConvergenceManager(pagerank_alpha=0.85, confidence=0.98) +early_stop_ranker = pg.PageRank(alpha=0.85, convergence=convergence) >> pg.Ordinals() +ordinals = early_stop_ranker(graph, {v: 1 for v in seeds}) +print(early_stop_ranker.convergence) +# 2 iterations (0.0005241000035312027 sec) +print(ordinals["B"], ordinals["D"], ordinals["E"]) +# 3.0 5.0 4.0 +``` + +Close to the previous results at a fraction of the time! For large graphs, +most ordinals would be near the ideal ones. Note that convergence time +does not take into account the time needed to preprocess graphs. diff --git a/docs/advanced/quickstart.md b/docs/advanced/quickstart.md deleted file mode 100644 index 4d440c7..0000000 --- a/docs/advanced/quickstart.md +++ /dev/null @@ -1,126 +0,0 @@ -# Demo - -As a quick start, let us construct a graph -and a set of nodes. The graph's class can be -imported either from the `networkx` library or from -`pygrank` itself. The two are in large part interoperable -and both can be parsed by our algorithms. -But our implementation is tailored to graph signal -processing needs and thus tends to be faster and consume -only a fraction of the memory. - -```python -from pygrank import Graph - -graph = Graph() -graph.add_edge("A", "B") -graph.add_edge("B", "C") -graph.add_edge("C", "D") -graph.add_edge("D", "E") -graph.add_edge("A", "C") -graph.add_edge("C", "E") -graph.add_edge("B", "E") -seeds = {"A", "B"} -``` - -We now run a personalized PageRank [graph filter](documentation/documentation.md#graph-filters) -to score the structural relatedness of graph nodes to the ones of the given set. -First, let us import the library: - -```python -import pygrank as pg -``` - -For instructional purposes, -we experiment with (personalized) *PageRank*. -Instantiation of this and more filters is described [here](../generated/graph_filters.md), -and can be accessed from the top-level import. -We also set the default values of some parameters: the graph diffusion -rate *alpha* required by this particular filter, a numerical tolerance *tol* at the -convergence point and a graph preprocessing strategy *"auto"* that normalizes -the graph adjacency matrix in either a column-based or symmetric -way, depending on whether the graph is undirected (as in this example) -or not respectively. - -```python -ranker = pg.PageRank(alpha=0.85, tol=1.E-6, normalization="auto") -ranks = ranker(graph, {v: 1 for v in seeds}) -``` - -Node ranking outputs are always organized into -[graph signals](documentation/documentation.md#graph-signals). -These can be used like dictionaries for easy access. -For example, printing the scores of some nodes can be done per: - -```python -print(ranks["B"], ranks["D"], ranks["E"]) -# 0.5173091321819129 0.24969444089457765 0.3415804634807899 -``` - -We alter this outcome so that it outputs node order, -where higher node scores are assigned lower order, -by wrapping a postprocessor around the base algorithm. -You can find more postprocessors [here](../generated/postprocessors.md), -including ones to make scores fairness-aware. - -```python -ordinals = pg.Ordinals(ranker).rank(graph, {v: 1 for v in seeds}) -print(ordinals["B"], ordinals["D"], ordinals["E"]) -# 1.0 5.0 4.0 -``` - -How much time did it take for the base ranker to converge? -(Depends on backend and device characteristics.) - -```python -print(ranker.convergence) -# 19 iterations (0.0021852000063518062 sec) -``` - -Since for this example only the node order is important, -we can use a different way to specify convergence: - -```python -convergence = pg.RankOrderConvergenceManager(pagerank_alpha=0.85, confidence=0.98) -early_stop_ranker = pg.PageRank(alpha=0.85, convergence=convergence) -ordinals = pg.Ordinals(early_stop_ranker).rank(graph, {v: 1 for v in seeds}) -print(early_stop_ranker.convergence) -# 2 iterations (0.0005241000035312027 sec) -print(ordinals["B"], ordinals["D"], ordinals["E"]) -# 3.0 5.0 4.0 -``` - -Close to the previous results at a fraction of the time!! For large graphs, -most ordinals would be near the ideal ones. Note that convergence time -does not take into account the time needed to preprocess graphs. - -Till now, we used `PageRank`, but what would happen if we do not know which base -algorithm to use? In these cases `pygrank` provides online tuning of generalized -graph signal processing filters on the personalization. The ranker -in the ranking algorithm construction code can be replaced with an automatically tuned -equivalent per: - -```python -tuned_ranker = pg.ParameterTuner() -ordinals = pg.Ordinals(tuned_ranker).rank(graph, {v: 1 for v in seeds}) -print(ordinals["B"], ordinals["D"], ordinals["E"]) -# 2.0 5.0 4.0 -``` - -This yields similar node ordinals, which means that tuning constructed -a graph filter similar to `PageRank`. -Tuning may be worse than highly specialized algorithms in some settings, -but usually finds near-best base algorithms. - -To obtain a recommendation about how to cite complex -algorithms, an automated description can be extracted -by the source code per the following -command: - -```python -print(tuned_ranker.cite()) -# graph filter \cite{ortega2018graph} with dictionary-based hashing \cite{krasanakis2022pygrank}, max normalization and parameters tuned \cite{krasanakis2022autogf} to optimize AUC while withholding 0.100 of nodes for validation -``` - -Bibtex entries corresponding to the citations can be found -[here](../tips/citations.md). \ No newline at end of file diff --git a/docs/basics/about.md b/docs/basics/about.md index 2730490..4172dc3 100644 --- a/docs/basics/about.md +++ b/docs/basics/about.md @@ -2,7 +2,7 @@ architecture -At the core of `pygrank` lies the concept of *graph signals*, which map graph nodes to scores. +At the core of `pygrank` lies the concept of *graph signals*, which map graph nodes to numerical scores. Supervised and unsupervised measures evaluate the predictive/ranking quality of graph signals. @@ -23,3 +23,4 @@ Here is a glossary of common concepts sorted alphabetically: | Personalization | The graph signal inputted in grap filters. This is also known as graph signal priors or the personalization vector. | | Seeds | Example nodes that are known to belong to a community. | | Tuning | A process of determining algorithm hyper-parameters that optimize some evaluation measure. | + diff --git a/docs/basics/filters.md b/docs/basics/filters.md index 5fcd351..dc5e28e 100644 --- a/docs/basics/filters.md +++ b/docs/basics/filters.md @@ -12,25 +12,20 @@ applies a graph filter, potentially postprocesses its outcome, and eventually ar pipeline - -## Calling filters - Filters are created based on a constructor that takes as input several keyword arguments affecting how they work. An exhaustive list of ready-to-use graph filters and their constructors -found [here](../generated/filters.md). After its initialization, a filter `alg` can run +found [here](../generated/filters.md). +More complicated node ranking algorithms can be obtained by applying postprocessors on +filters. This is covered in the [next section](postprocessors.md). +After its initialization, a filter `alg` can run with one of the following two patterns (these are interchangeable): * `ranks = alg(graph, personalization)` * `alg(pg.to_signal(graph, personalization))` -More complicated node ranking algorithms can be obtained by applying postprocessors on -filters. This is covered in the [next section](postprocessors.md). - - -## Example -Let us define an personalized PageRank filter. If the personalization is +As an example, let us define an personalized PageRank filter. If the personalization is binary (i.e. all nodes have initial scores either 0 or 1) this algorithm is equivalent to a stochastic Markov process where it starts from the nodes with initial scores 1, iteratively jumps to neighbors randomly, and has diff --git a/docs/basics/postprocessors.md b/docs/basics/postprocessors.md index 2baab02..d364759 100644 --- a/docs/basics/postprocessors.md +++ b/docs/basics/postprocessors.md @@ -2,22 +2,15 @@ Filter outcomes of graph often require additional processing steps, for example to perform normalization, improve their quality, or apply fairness constraints. -We refer to the improvement of graph filter outcomes as postprocessing, -and let node ranking algorithms perform any number of postprocessing steps -on top of base graph filters. - - -## Wrapping filters - -Postprocessors wrap base graph filters to affect their outcome. The resulting +Postptprocessors wrap base filters to improve their outcomes, and the result node ranking algorithms are called as if they were still filters. The filters can either be supplied to construcors, or the postprocessors may be initialized from the rest of their arguments and applied onto filters afterwards with the -pattern `algorithm = filter >> postprocessor`. +functional chain pattern `algorithm = filter >> postprocessor`. An list of ready-to-use postprocessors can be -found [here](../generated/postprocessors.md). Simplar ones perform +found [here](../generated/postprocessors.md). Simpler ones perform normalization, for example to enforce the maximal or the sum of node scores to be 1. There also exist thresholding schemes, which can be used for binary community detection, as well as methods to make node @@ -30,14 +23,11 @@ by providing more example nodes, and for fairness-aware posteriors, which aim to make node scores adhere to some fairness constraint, such as disparate impact. - -## Example - -Let us consider a simple scenario where we want the graph signal outputted +Let us consider a simple toy scenario where we want the graph signal outputted by a filter to always be normalized so that its largest node score is one. For -this, we consider a graph `G`, signal `signal` and filter `alg`, -and will use the postprocessor `Normalize`. For convenience, some postprocessors supply a -`transform` method that can be applied on graph signals like so: +graph `G`, signal `signal` and filter `alg`, +and can use the postprocessor `Normalize("max")`. For convenience, simpler postprocessors +like this one supply a method to transform graph signals like so: ```python scores = alg(graph, signal) @@ -46,8 +36,8 @@ print(list(normalized_scores.items())) # [('A', 1.0), ('B', 0.4950000024069947), ('C', 0.9828783455187619), ('D', 0.9540636897749238), ('E', 0.472261528845582)] ``` -However, the pattern that works for **all** postprocessors -is to wrap base algorithms, like in the following example: +The pattern that works for **all** postprocessors +is to wrap base algorithms, like in the following equivalent example: ```python @@ -57,13 +47,12 @@ print(nscores) # [('A', 1.0), ('B', 0.4950000024069947), ('C', 0.9828783455187619), ('D', 0.9540636897749238), ('E', 0.472261528845582)] ``` -We now apply more steps to the algorithm by performing -an element-wise exponential transformation of node scores -with the postprocessor `Transformer` *before* normalization -can be achieved as: +We can add more steps, such as +an element-wise exponential transformation of scores +before normalization: ```python -nealg = alg >> pg.Transformer(pp.exp) >> pg.Normalize("max") +nealg = alg >> pg.Transformer(pf.exp) >> pg.Normalize("max") nescores = nealg(graph, signal) print(nescores) # [('A', 1.0), ('B', 0.8786683440755908), ('C', 0.9956241609824301), ('D', 0.9883030876536782), ('E', 0.8735657648099558)] diff --git a/docs/basics/quickstart.md b/docs/basics/quickstart.md new file mode 100644 index 0000000..b18f4f9 --- /dev/null +++ b/docs/basics/quickstart.md @@ -0,0 +1,36 @@ +# Quickstart + +1. Install the library with `pip install pygrank`, import it, +and construct a node ranking algorithm +(incrementally apply postprocessors with `>>`). +There are many components and parameters to find +good configurations; [autotuning](advanced/autotuning.md) may be helpful. + +```python +import pygrank as pg + +hk5 = pg.HeatKernel(t=5, normalization="symmetric", renormalize=True) # a graph filter +hk5_advanced = hk5 >> pg.SeedOversampling() >> pg.Sweep() >> pg.Normalize("max") +``` + +2. Automatically load a graph and a community of nodes with some shared attribute. +You can also use a `networkx` graph. +Then run the algorithm to get a graph signal that maps nodes to scores, where scores indicate +structural proximity to community members. + +```python +_, graph, community = next(pg.load_datasets_one_community(["EUCore"])) +personalization = {node: 1.0 for node in community} # binary or stochastic membership, missing scores are zero + +scores = hk5_advanced(graph, personalization) # returns a dict-like pg.GraphSignal +print(scores) # {'0': 0.3154503251398683, '1': 0.26661671252340463, '2': 0.03700150026429704, ... } +``` + +3. Evaluate scores; here we use a stochastic generalization of the unsupervised Conductance measure (that +can parse scores). + +```python +measure = pg.Conductance() # an evaluation measure +pg.benchmark_print_line("My conductance", measure(scores)) # pretty +print("Cite this algorithm as:", hk5_advanced.cite()) +``` diff --git a/docs/index.md b/docs/index.md index d5b4f1c..bfb02c6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,58 +1,46 @@ -# pygrank +# pygrank +**Fast node ranking algorithms on large graphs.** -Fast node ranking algorithms on large graphs. +
-## Quickstart +
+
+
Datacentric
+

Graph signals work as both dictionaries and arrays. + They are easy to create and run on efficient backends.

+
+
-1. Install the library with `pip install pygrank`, import it, -and construct a node ranking algorithm -(incrementally apply postprocessors with `>>`). -There are many components and parameters to find -good configurations; [autotuning](advanced/autotuning.md) may be helpful. -```python -import pygrank as pg +
+
+
Big & Fast
+

Fast processing of big graphs; sparse data structures, + extensive caching, and scalable algorithms.

+
+
-hk5 = pg.HeatKernel(t=5, normalization="symmetric", renormalize=True) # a graph filter -hk5_advanced = hk5 >> pg.SeedOversampling() >> pg.Sweep() >> pg.Normalize("max") -``` -2. Automatically load a graph and a community of nodes with some shared attribute. -You can also use a `networkx` graph. -Then run the algorithm to get a graph signal that maps nodes to scores, where scores indicate -structural proximity to community members. -```python -_, graph, community = next(pg.load_datasets_one_community(["EUCore"])) -personalization = {node: 1.0 for node in community} # binary or stochastic membership, missing scores are zero +
+
+
Modular
+

Combine graph filters with multiple postprocessor components + through a seamless pipeline.

+
+
-scores = hk5_advanced(graph, personalization) # returns a dict-like pg.GraphSignal -print(scores) # {'0': 0.3154503251398683, '1': 0.26661671252340463, '2': 0.03700150026429704, ... } -``` -3. Evaluate scores; here we use a stochastic generalization of the unsupervised Conductance measure (that -can parse scores). +
+
+
Evaluation
+

Run benchmarks, or autotune algorithms + as they run based on supervised or unsupervised measures.

+
+
-```python -measure = pg.Conductance() # an evaluation measure -pg.benchmark_print_line("My conductance", measure(scores)) # pretty -print("Cite this algorithm as:", hk5_advanced.cite()) -``` -## Citation +
-Don't forget to cite all algorithms you are using. Find implemented papers [here](tips/citations.md). -This is the citation for `pygrank` only: -``` -@article{krasanakis2022pygrank, - author = {Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris, Andreas Symeonidis}, - title = {pygrank: A Python Package for Graph Node Ranking}, - journal = {SoftwareX}, - year = 2022, - month = oct, - doi = {10.1016/j.softx.2022.101227}, - url = {https://doi.org/10.1016/j.softx.2022.101227} -} -``` \ No newline at end of file diff --git a/docs/theme_extend.css b/docs/theme_extend.css index 8582206..bb73b6e 100644 --- a/docs/theme_extend.css +++ b/docs/theme_extend.css @@ -96,4 +96,39 @@ .code-block { margin-top: -10px; +} + +.card { + width: 18rem; + border: 1px solid #ddd; + border-radius: 10px; + box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); +} + +.card-body { + padding: 20px; +} + +.card-title { + font-weight: bold; + color: #333; +} + +.card-text { + color: #666; + font-size: 14px; +} + +.card-link { + display: inline-block; + margin-top: 10px; + padding: 10px 15px; + background-color: #007bff; + color: #fff; + text-decoration: none; + border-radius: 5px; +} + +.card-link:hover { + background-color: #0056b3; } \ No newline at end of file diff --git a/docs/tips/big.md b/docs/tips/big.md index 7f8693f..aad04d9 100644 --- a/docs/tips/big.md +++ b/docs/tips/big.md @@ -1,14 +1,13 @@ # Big Graphs -This documents provides a FAQ on how to handle big graphs, with millions -or more of edges. +This documents provides a FAQ on how to handle big graphs with millions of edges. -### Which backend to use? +### Which backend? Use *numpy* (the default backend). It's the most well-tested and memory efficient. -### Which algorithm to use? +### Which algorithm? `pygrank` algorithms are split into graph filters and postprocessors to augment their outcome. Here we touch on graph filters. @@ -23,29 +22,27 @@ algorithm = pg.PageRank(alpha=0.9, tol=1.E-9, max_iters=1000) For compatibility with `networkx` and other historical practices, these are not the default parameters values. -However, they tend to work well in big graphs. A little explanation on -the choices: -- Personalized PageRank is equivalent to stochastic random +However, they tend to work well in big graphs. As a little explanation on +the choices, personalized PageRank is equivalent to stochastic random walks with average length *1/(1-alpha)* hops away from -seed nodes. -- At the same time, you need a small enough +seed nodes. At the same time, you need a small enough numerical tolerance to make sure that your numer of seeds divided by your number of nodes is not immediately -smaller than that. -- Higher diffusion parameters *alpha* and +smaller than that. Finaklly, higher diffusion parameters *alpha* and lower numerical tolerances increase the number of iterations it takes for recursive graph filters to converge. Thus, a higher cap to the computational limit should be placed to make sure that this is not exceeded before convergence. -:warning: As a last failsafe against unforeseen convergence properties, -make sure that you run algorithms -with computational limits within allocated budget. +!!! info + As a last failsafe against unforeseen convergence properties, + make sure that you run algorithms + with computational limits within allocated budget. -### My communities do not comprise enough members. +### Few data. -Try to increase the receptive field of node ranking algorithms, +If your communities have too few members, try to increase the receptive field of node ranking algorithms, for example by increasing *alpha* in pagerank. If you have increased the receptive field but require more expansions try applying the `SeedOversampling()` and, if you are fine with its computational @@ -53,7 +50,7 @@ demands, `BoostedSeedOversampling()` postprocessors on your algorithms. -### My graph is already a scipy sparse matrix. +### Graph is already a scipy sparse matrix. Note that node ranking algorithms and graph signals typically require graphs. However, sometimes diff --git a/docs/tips/citations.md b/docs/tips/citations.md index 5a11a3b..d2a6a1a 100644 --- a/docs/tips/citations.md +++ b/docs/tips/citations.md @@ -1,18 +1,15 @@ # Citations Several research outcomes have been implemented and integrated in `pygrank`. -In addition to this package, please cite related publications, for example -by modifying automatically generated descriptions (see below). -
-
-Do not forget to also cite dataset sources! Related instructions +In addition to the package itself [krasanakis2022pygrank], please cite related publications, for example +by modifying automatically generated descriptions (see below). Do not forget to also cite dataset sources! Related instructions can be found [here](../generated/datasets.md). -## Automated Algorithm Citations +## Autocite The `NodeRanking.cite()` method can be used to -automatically generate descriptions of algorithms, including -publication citations. Reference names correspond to the list of +automatically generate descriptions of algorithms that include +references. Reference names correspond to the list of publication bibtex entries presented in the rest of the document. For example, the following snippet defines a node ranking algorithm diff --git a/docs/tips/license.md b/docs/tips/license.md new file mode 100644 index 0000000..8b20a7e --- /dev/null +++ b/docs/tips/license.md @@ -0,0 +1,178 @@ +# Apache License + + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 7868d97..b11e8c6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -5,6 +5,7 @@ copyright: Copyright © 2024 Emmanouil Krasanakis nav: - Home: 'index.md' + - Quickstart: 'basics/quickstart.md' - Basics: - 'basics/about.md' - 'basics/signals.md' @@ -17,12 +18,12 @@ nav: - 'advanced/convergence.md' - 'advanced/graph_preprocessing.md' - Applications: - - 'advanced/quickstart.md' - 'advanced/community.md' - 'advanced/gnn.md' - For professionals: - 'tips/citations.md' - 'tips/big.md' + - 'tips/license.md' - API: - 'generated/graph_filters.md' - 'generated/postprocessors.md'