Undergraduate course completion work presented to the Computer Institute of Federal University of Rio de Janeiro as part of the requirements for obtaining the degree of Bachelor in Computer Science.
Notebook using models from different languages to translate words. This code use the Procrustes Problem and a subset of words with known translations to find the best translation matrix.
- Python 3.9 or higher
-
Jupyter notebook -
pip install notebook
-
Gensim -
pip install gensim
-
Download this repo:
git clone https://github.com/AlexSantoss/WordEmbedding-Translator.git
- Inside the folder "Datasets", create a new one called "FastText".
-
Download and extract the following files inside the FastText folder:
- Danish: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.da.300.vec.gz
- Dutch: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.nl.300.vec.gz
- English: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz
- French: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.fr.300.vec.gz
- German: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.de.300.vec.gz
- Italian: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.it.300.vec.gz
- Portuguese: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.pt.300.vec.gz
- Romanian: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.ro.300.vec.gz
- Spanish: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.es.300.vec.gz
- Swedish: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.sv.300.vec.gz
- Run the file Translator.ipynb
You can transform Glove models to Word2Vec model by running:
python -m gensim.scripts.glove2word2vec --input glove.model.txt --output w2v.model.txt