This project is a lexical substitution task using:
- part2: WordNet Frequency Baseline
- part3: Simple Lesk Algorithm
- part4: pre-trained Word2Vec embeddings
- part5: BERT masked language model
- Install NLTK - Natural Language Toolkit
pip install nltk
python
>>> import nltk
>>> nltk.download()
In the Corpora tab, install wordnet and stopwords.
- Install gensim, a vector space modeling package for Python.
- Install Huggingface Transformers, BERT implementation by Huggingface (an NLP company), or more specifically their slightly more compact model DistilBERT
pip install gensim
pip install transformers
python lexsub_main.py lexsub_trial.xml > part5.predict
lexical substitution outputs are stored in .predict files