Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topological metrics #74

Merged
merged 28 commits into from
Mar 11, 2024
Merged

Topological metrics #74

merged 28 commits into from
Mar 11, 2024

Conversation

barneydobson
Copy link
Collaborator

@barneydobson barneydobson commented Mar 6, 2024

Description

Place to implement topological metrics

Fixes #50

Summary of changes:

  • Removed some unused dependencies
  • Forked netcomp to make it compatible with networkx 3. This package implements the various graph similarity metrics from the paper suggested by @cheginit .
  • Implemented and tested these metrics in metric_utilities.
  • Added debug_topology to study whether these metrics are behaving sensibly.

Summary of debug_topology:

  • Download a small street network graph (using prepare_data.py/download_street), a slightly larger street network which contains the small graph, and a slightly larger one again which contains that, finally another larger street network graph which does not overlap the first three.
  • Test all graphs for all topology metrics and plot the distances between them.
  • What I would expect is that the small graph is closest to the medium, and then large, and then separated one.
  • Here are the heatmaps of the various metrics:
    image
    Observations:
  • In all cases - the medium graph and smallest graph are closest to each other - good!
  • Under nc_laplacian_dist, nc_laplacian_norm_dist, nc_adjacency_dist, and kstest_betweenness the large and separate graphs are very close - not good! Additionally these four seem to be somewhat behaving similarly.
  • nc_deltacon0 and nc_resistance_distance seem to be behaving similarly.
  • nc_vertex_edge_distance seems to have very large values in general, but I would say qualitatively seems closer to the laplacian-type results.

Thus, provided we are happy that these are implemented properly - I think this covers quite well different measures of topological distance. In my tests these metrics are quick to calculate, but if they become prohibitive, we have evidence here to select maybe only 2 or 3 to cover the broad categories of behaviour.

@barneydobson barneydobson added the sa_paper Sensitivity analysis paper label Mar 7, 2024
@barneydobson barneydobson self-assigned this Mar 7, 2024
@cheginit
Copy link
Collaborator

cheginit commented Mar 8, 2024

Very nice comparison!

Perhaps, you can compute and compare these metrics for the GRIP and OSM datasets, for the same bounding box.

@cheginit
Copy link
Collaborator

cheginit commented Mar 9, 2024

For one of my projects, I wrote this function for computing BC in parallel, it can speed up the computations by a factor of 3 or 4 depending on the complexity of the network:

import joblib
import networkx as nx
import cytoolz.curried as tlz
from collections import defaultdict


def edge_betweenness_centrality(G: nx.Graph, normalized: bool = True, weight: str = "weight", njobs: int = -1):
    """Parallel betweenness centrality function"""
    njobs = joblib.cpu_count(True) if njobs == -1 else njobs
    node_chunks = tlz.partition_all(G.order() // njobs, G.nodes())
    bt_func = tlz.partial(nx.edge_betweenness_centrality_subset, G=G, normalized=normalized, weight=weight)
    bt_sc = joblib.Parallel(n_jobs=njobs)(
        joblib.delayed(bt_func)(sources=nodes, targets=G.nodes()) for nodes in node_chunks
    )

    # Merge the betweenness centrality results
    bt_c = defaultdict(float)
    for bt in bt_sc:
        for n, v in bt.items():
            bt_c[n] += v
    return bt_c

Also, there's this library called graph-tool that is freakishly fast, but the caveat is that it's not available on PyPi and can only be installed from conda-forge.

In one of my tests, the BC computation with networkx took 20 min, with the parallel version took 4 min, with netowrkit took about 18 sec, and with graph-tool it finished in less than a second!

@barneydobson
Copy link
Collaborator Author

barneydobson commented Mar 11, 2024

For one of my projects, I wrote this function for computing BC in parallel, it can speed up the computations by a factor of 3 or 4 depending on the complexity of the network:

import joblib
import networkx as nx
import cytoolz.curried as tlz
from collections import defaultdict


def edge_betweenness_centrality(G: nx.Graph, normalized: bool = True, weight: str = "weight", njobs: int = -1):
    """Parallel betweenness centrality function"""
    njobs = joblib.cpu_count(True) if njobs == -1 else njobs
    node_chunks = tlz.partition_all(G.order() // njobs, G.nodes())
    bt_func = tlz.partial(nx.edge_betweenness_centrality_subset, G=G, normalized=normalized, weight=weight)
    bt_sc = joblib.Parallel(n_jobs=njobs)(
        joblib.delayed(bt_func)(sources=nodes, targets=G.nodes()) for nodes in node_chunks
    )

    # Merge the betweenness centrality results
    bt_c = defaultdict(float)
    for bt in bt_sc:
        for n, v in bt.items():
            bt_c[n] += v
    return bt_c

Also, there's this library called graph-tool that is freakishly fast, but the caveat is that it's not available on PyPi and can only be installed from conda-forge.

In one of my tests, the BC computation with networkx took 20 min, with the parallel version took 4 min, with netowrkit took about 18 sec, and with graph-tool it finished in less than a second!

OK I will make an issue, #80, for graph-tool. I subbed out your function above for the networkx.betweenness_centrality function, and the results in the tests are slightly different (your function = 0.38995, networkx = 0.2862) - is it a problem?

@barneydobson
Copy link
Collaborator Author

@cheginit Oh I think I have used betweenness_centrality instead of edge_betweenness_centrality - I didn't realise there was a difference, I guess that explains why the values are different

@cheginit
Copy link
Collaborator

Note that this is for edge BC, for node BC, you need to change the function, so you should compare it with nx.edge_betweenness_centrality. I have a test suite for this, so there shouldn't be an issue.

@barneydobson
Copy link
Collaborator Author

Yep yep!

-Use taher's new function for edge betweenness
-introduce new metric for edge betweenness (in contrast to node betweenness)
-update requirements
@barneydobson barneydobson merged commit 2fe0f62 into main Mar 11, 2024
10 checks passed
@barneydobson barneydobson deleted the topological_metrics branch March 11, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sa_paper Sensitivity analysis paper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Toplogical metrics
2 participants