Similarity-Calculation/README.md at master · pinksi/Similarity-Calculation · GitHub

This contains the comparison between two similarity calculating methods: cosine similarity and soft cosine similarity.

The dataset used to build word2vec model are:

numberbatch-en.txt
GoogleNews-vectors-negative300.bin
glove-wiki-gigaword-100 These data can be downloaded from here: https://github.com/RaRe-Technologies/gensim-data

Requirements: pandas, nltk, numpy, sklearn, gensim