- Code regarding Pre-training (i.e. corpus construction and pre-training scripts) is located in "eleet_pretrain"
- Code for everything else (e.g. query plans, MMOps, baselines, benchmark) is located in "eleet".
- Some scripts are located in "scripts" and "slurm" as described below.
-
Use Python version 3.8
-
Install PyTorch https://pytorch.org/get-started/locally/ (tested using conda)
-
Install torch-scatter: https://github.com/rusty1s/pytorch_scatter (tested using conda)
Versions we used:
$ conda list | grep torch pytorch 1.12.1 py3.8_cuda11.3_cudnn8.3.2_0 pytorch pytorch-mutex 1.0 cuda pytorch pytorch-scatter 2.0.9 py38_torch_1.12.0_cu113 pyg torch 1.12.0+cpu pypi_0 pypi torch-scatter 2.0.7 pypi_0 pypi torchaudio 0.12.0+cpu pypi_0 pypi torchvision 0.13.0+cpu pypi_0 pypi
-
Install Cython:
pip install Cython
-
Install PyJinius:
conda install -c conda-forge pyjnius
-
Install FastBPE:
conda install -c conda-forge fastbpe
-
Install curl:
conda install curl
-
Install other stuff:
pip install -r requirements.txt
-
Install:
pip install -e .
-
Install TaBERT:
cd TaBERT/ && pip install -e . && cd ..
-
Download English Language for spacy:
python -m spacy download en_core_web_sm
- Run MongoDB and set environment variables (MONGO_USER, MONGO_PASSWORD, MONGO_HOST, MONGO_PORT, MONGO_DB) https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-ubuntu/
- Start data pre-processing: python scripts/load_data.py trex-wikidata --> preprocessed data will appear in datasets/preprocessed_data/preprocessed_trex-wikidata*
- Use slurm/pretrain.slurm for pre-training (Adjust path in file first). --> Will store pretrained model in models/pretrained
- Generate TREx Dataset:
python eleet/datasets/trex/generate.py
- Generate Rotowire Dataset:
python eleet/datasets/rotowire/generate.py
- Run finetuning:
sbatch slurm/rotowire/train-ours.slurm
(Repeat for other datasets and models). --> Will store finetuned model in models/rotowire/ours/finetuned - Run evaluation:
python eleet/benchmark.py --slurm-mode --use-test-set
- Visualize results using Jupyter notebooks located in
scripts/*.ipynb