Project Structure

Code regarding Pre-training (i.e. corpus construction and pre-training scripts) is located in "eleet_pretrain"
Code for everything else (e.g. query plans, MMOps, baselines, benchmark) is located in "eleet".
Some scripts are located in "scripts" and "slurm" as described below.

How to install

Use Python version 3.8
Install PyTorch https://pytorch.org/get-started/locally/ (tested using conda)

Install torch-scatter: https://github.com/rusty1s/pytorch_scatter (tested using conda)

Versions we used:

$ conda list | grep torch
pytorch                   1.12.1          py3.8_cuda11.3_cudnn8.3.2_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytorch-scatter           2.0.9           py38_torch_1.12.0_cu113    pyg
torch                     1.12.0+cpu               pypi_0    pypi
torch-scatter             2.0.7                    pypi_0    pypi
torchaudio                0.12.0+cpu               pypi_0    pypi
torchvision               0.13.0+cpu               pypi_0    pypi

Install Cython: pip install Cython
Install PyJinius: conda install -c conda-forge pyjnius
Install FastBPE: conda install -c conda-forge fastbpe
Install curl: conda install curl
Install other stuff: pip install -r requirements.txt
Install: pip install -e .
Install TaBERT: cd TaBERT/ && pip install -e . && cd ..
Download English Language for spacy: python -m spacy download en_core_web_sm

Pre-training

Run MongoDB and set environment variables (MONGO_USER, MONGO_PASSWORD, MONGO_HOST, MONGO_PORT, MONGO_DB) https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-ubuntu/
Start data pre-processing: python scripts/load_data.py trex-wikidata --> preprocessed data will appear in datasets/preprocessed_data/preprocessed_trex-wikidata*
Use slurm/pretrain.slurm for pre-training (Adjust path in file first). --> Will store pretrained model in models/pretrained

Finetuning + Evaluation

Generate TREx Dataset: python eleet/datasets/trex/generate.py
Generate Rotowire Dataset: python eleet/datasets/rotowire/generate.py
Run finetuning: sbatch slurm/rotowire/train-ours.slurm (Repeat for other datasets and models). --> Will store finetuned model in models/rotowire/ours/finetuned
Run evaluation: python eleet/benchmark.py --slurm-mode --use-test-set
Visualize results using Jupyter notebooks located in scripts/*.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
TaBERT		TaBERT
eleet		eleet
eleet_pretrain		eleet_pretrain
scripts		scripts
slurm		slurm
text_to_table		text_to_table
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Structure

How to install

Pre-training

Finetuning + Evaluation

About

Releases

Packages

Languages

License

DataManagementLab/eleet

Folders and files

Latest commit

History

Repository files navigation

Project Structure

How to install

Pre-training

Finetuning + Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages