Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 2.42 KB

README.md

File metadata and controls

21 lines (14 loc) · 2.42 KB

Identifying overlapping functional modules of protein interactions graph

Introduction

In this project, I tried implementing one of the most efficient algorithms for large graphs clustering, the SR-MCL, in python and checked how it works on both big and small graphs. Due to this, the implemented code was then run on a real-world dataset, weighted yeast proteins interactome, to check how long it takes to reach an end and converges with a graph of size 1e3 nodes and 1e4 edges!

Methodology

The SR-MCL algorithm is an improvement to the MCL (Markov Clustering Algorithm). The MCL is a practical algorithm for clustering biological networks, for instance, clustering protein-protein interaction (PPI) networks to identify functional modules. But it has some limitations and problems with bridge nodes.

A few years later, the R-MCL (Regularized MCL) introduced by Yu-Keng Shih and Srinivasan Parthasarathy in 2010 solved some of the MCL problems and improved execution time. However, still, the problem with bridge nodes remains. Two years later, the same people came up with the new idea of SR-MCL. The main idea behind it was to make the R-MCL algorithm not focus on bridges by softening the weights of the canonical flow matrix after each step.

Result on a small random graph

  • After testing its performance to see how it works, I created a random graph with 50 nodes and some edges between them.

RandomGraph

  • Then by running the algorithm on them, I colored each cluster a different color, But the bridge nodes colored yellow to be identified!

ClusteredRandomGraph

Reference