This repo contains code and results for project done in the field of Graph Mining.
This project does an analysis on different correlation based clustering using the S&P 500 stock data, namely, DBSCAN, Hierarchical clustering, and Correlation Clustering. A Mod-CC-PIVOT algorithm was also developed from the CC-PIVOT algorithm. This project was completed as a part of the Final Project submission for CSCE 689 - Graph Mining and Analysis.
====================================================================================
The code folder contains all the python codes used for this project. The project was implemented in the jupyter notebook and ipynb files have been provided. The heat map generated are stored in the plots folder.
Run the get_data.py file in order to get the stock data. The data folder already contains the stock data for 1 year and 3 years in pickle format.
====================================================================================
Three correlation based algorithms have been used -
- DBSCAN algorithm - To run the file, open the "dbscan_Clustering.ipynb" file and run the cells.
- Hierarchical Clustering algorithm - To run the file, open the "Hierarchial_Clustering.ipynb" and run the cells.
- Correlation Clustering algorithm - To run the file, open the "correlation_Clustering.ipynb" and run the cells.
The ipynb file also has the outputs saved in the file.
====================================================================================
The code is also uploaded into the github - https://github.com/natsu1628/Correlation_Based_Clustering
The report of the project is included in the repo.