Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 1.39 KB

File metadata and controls

38 lines (28 loc) · 1.39 KB

Link_prediction

Link prediction using proximity-based methods

This project was done in the subject, COMP90051 (Statistical Machine learning) taken in Semester2, 2020 in the University of Melbourne.

Features

Among numerous approaches we took, this is about our final approach. For features, we implemented methods for getting features below.

  1. jaccard distance
  2. cosine distance
  3. adamic-adar index
  4. preferential attachment
  5. Resource allocation
  6. Other features: followers/followees of source/sink each and their common followers/followees

[feature importance] feature importance

Implementation:

We referred to some implemented codes in the github but mostly it was easy to implement according to the formula just by using python dictionary. Mainly two types of dictionary which: 1) stores nodes that are followed by a node 2) stores nodes that follows a node

Model:

  • XG boost: Powerful for classification problems. directly output the probability of being a positive label using an objective set to ‘binary:logistic’.

Sampling data:

  • 50k pos/50k neg random sampling

RUN

To run quickly, change params of def get_trained() to smaller size:

ex) get_trainset(500, 500) // instead of (50000, 50000)

Result:

  • Final (Private leaderboard) score: 0.89480