Ethel Tutor Eval

This repository contains a framework for evaluating Ethel on education-specific benchmark datasets.

Datasets

Currently, our evaluation pipeline supports the following datasets:

Use the following command with the dataset and model command line arguments to run the evaluation

Required arguments:

Optional arguments:

Example:

python3 -m scripts.run_pipeline --dataset=MATH --model=Ethel

Required arguments:

-- dataset: The dataset to be used for the evaluation, e.g.: GSM8K
-- model: The model type that works as a tutor to generate the answer, e.g.: Ethel
-- grader_model: The model type to grade the tutor's answer, e.g.: Ethel

Optional arguments:

-- model_name: The exact name of the model, e.g.: swissai/ethel-70b-magpie
-- grader_model_name: The exact name of the model, e.g.: swissai/ethel-70b-magpie
-- limit: If you prefer to run the evaluation on a subset of the dataset
-- closed_book: If you want to evaluate on a closed_book setup (or on open_book)

python3 -m scripts.run_tutoreval --dataset=TutorEval --model=Ethel --grader_model=Ethel

This work was done as part of the ML4Science project at the CS-433 course at EPFL.

The main contributors are: