SRBedding

Project setup

Before runnning the project setup the environment
poetry shell
poetry update

Evaluation jupyter notebook

Inside or evaluation-pipetine add datasets folder and results.
For loading SQuAD-sr you need to add the squad-sr-lat.json into the datasets folder.
First run the make-evaluation-datasets.ipynb. This will create all the files needed
Then run
cd evaluation-pipetine/
python evaluation-pipieline.py

Training dataset creation

Run the following commands for creating the training dataset:

cd training_dataset
python .\main_training.py
python .\batch_loading.py
The .parquet files will be saved in the datasets folder.

Translating dataset

The folder translation_pipeline is used for translating ms_marco and natural_questions from English to Serbian. Translated queries and contexts from this datasets will be used for evaluation. Run the following commands:

cd translation_pipeline
python .\sending_batch.py
python .\processing_batch.py

The folder translation_sts is used for translating one sentence pair from the sts dataset for the distiladion evaluator. Run the following commands:

cd translation_sts
python .\sending_batch.py
python .\processing_batch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SRBedding

Project setup

Evaluation jupyter notebook

Training dataset creation

Translating dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

SRBedding

Project setup

Evaluation jupyter notebook

Training dataset creation

Translating dataset