This is an attempt to predict the result of a game of chess between any 2 player given their previous match history
This repository contains:
The dataset used in this repo is the Chess network, part of the Koblenz Network Collection. They are results of chess games. Each node is a chess player, and a directed edge represents a game, with the white player having an outgoing edge and the black player having an ingoing edge. The weight of the edge represents the outcome (+1 white won, 0 draw, −1 black won). The dataset is anonymous: the identity of the players is unknown, and timestamps are approximate. Timestamps are given to a one month precision, and may have been shifted towards the future by an unknown amount.
- Total number of nodes (players): 7301
- Total number of edges (matches): 65053
- Average node degree: 17.82
- Percentage of white wins: 32.53 %
- Percentage of black wins: 23.40 %
- Percentage of draws: 44.06 %
Consider the below example, where the 5 players form the nodes, and the edges representing a game played between them.
The red directed line between 2 players represents a game of chess, where the player having an outgoing edge is white and the player having an incoming edge plays black. The task is, given the results of the played matches, to predict the outcome of the green lines. This is an edge classification task, where an edge has to be classified as either -1 or 0 or 1
The following is the overview of the used model
Optimizer: Adam with learning rate = 0.01
Loss Function: Cross Entropy Loss
Apart from the last layer where Softmax was used, ReLU was used between all layers
Splitting the data into 85-15, a RMSE of 0.6 was achieved on the test set. Even though this is an classification task, it was considered to be a regression task in the Kaggle Competition. The achieved RMSE on the test split easily tops the leader board. (Please not that as on July 08, 2021 the competition's submission was disabled, hence unable to get the RMSE on test data)
But the achieved results are still low. It may be due to to the very less node degree. Intuitively an average chess player can not be judged with 17 games of his, played over a span of 10 years
Download the following files in a same directory
grandmaster.py
cross_validation_dataset.csv
training_data.csv
test_data.csv
and make sure you have the following libraries installed.
torch
torch_geometric
numpy
pandas
Run grandmaster.py
cd directory-where-the-files-are-saved
python3 grandmaster.py
The problem that was encountered while building the graph network was the inability to differentiate the win and loss edges via positive and negative edge weights. This is important because if all the edge weights are positive then there would be no difference between a winning directed edge and a lossing directed edge. To my knowledge, none of the present (Jun 2021) graph neural algorithms / frameworks support negative edge weights, and if in future any algorithm with this ability is introduced, it may be used to experiment on this datasset.
@MISC{konect:2017:chess,
title = {Chess network dataset -- {KONECT}},
month = oct,
year = {2017},
url = {http://konect.cc/networks/chess}
}