This contains the comparison between two similarity calculating methods: cosine similarity and soft cosine similarity.
The dataset used to build word2vec model are:
- numberbatch-en.txt
- GoogleNews-vectors-negative300.bin
- glove-wiki-gigaword-100 These data can be downloaded from here: https://github.com/RaRe-Technologies/gensim-data
Requirements: pandas, nltk, numpy, sklearn, gensim