Add `litgpt evaluate` command #1177

rasbt · 2024-03-21T20:39:44Z

This adds the litgpt evaluate command along with a wrapper function that allows someone to evaluate models without needing to convert or copy anything manually. I.e., it's just

litgpt finetune lora \
  --checkpoint_dir checkpoints/microsoft/phi-2 \
  --out_dir lora_model

litgpt evaluate \
  --checkpoint_dir lora_model/final \
  --out_dir evaluate_model/ \

Fixes #1168

rasbt · 2024-03-21T20:42:19Z

One question is how to add the tests @carmocca because I assume it's very expensive. One could probably test

it on the smallest phi-2 model and on a small number of examples e.g.

litgpt evaluate \
  --checkpoint_dir checkpoints/EleutherAI/pythia-14m/ \
  --out_dir out/ \
  --repo_id EleutherAI/pythia-14m \
  --limit 10 \
  --tasks "hellaswag"

Still, I don't know if the CI will be able to handle it. I am also not sure if there's a good way to mock this.

carmocca · 2024-03-22T03:52:11Z

@rasbt I think you should be able to update the tests in https://github.com/Lightning-AI/litgpt/blob/main/tests/test_lm_eval_harness.py#L30. In fact we might want to remove the existing eval implementation

rasbt · 2024-03-22T15:50:00Z

Thanks, I hope the test I added suffices

litgpt/scripts/evaluate.py

Co-authored-by: awaelchli <[email protected]>

rasbt · 2024-03-25T16:37:07Z

Thanks for all the feedback. These were all great suggestions!

The latest updates include

Make checkpoint_dir a required argument
Set default output_dir as checkpoint_dir/evaluate
Moved evalute.py out of litgpt/scripts
Automatically infer the repo_id from the config file

Question 1: Should we make output_dir the same as checkpoint_dir by default (evaluate will save the converted model + results.json)?

Question 2: I moved it to litgpt/eval/evaluate.py. Since it's a single module, we could also just make it litgpt/evaluate.py and remove the `eval subfolder. Wdyt?

tutorials/evaluation.md

tests/test_evaluate.py

tests/test_lm_eval_harness.py

litgpt/eval/evaluate.py

rasbt · 2024-03-30T21:53:48Z

Phew, finally got the test to work. It should be ready to review (and hopefully merge). If you have a moment, I'd appreciate your feedback @awaelchli & @carmocca

tutorials/images/0_to_litgpt/pretrain.webp

carmocca

I'm trying this out locally but It fails for me on:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 135, in <module>
    main()
  File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 131, in main
    fn(**kwargs)
  File "/home/carmocca/git/lit-stablelm/litgpt/eval/evaluate.py", line 99, in convert_and_evaluate
    results = evaluator.simple_evaluate(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
    return load_task(task_config, task=name_or_config, group=parent_name)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 150, in load_task
    task_object = ConfigurableTask(config=config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 782, in __init__
    self.download(self.config.dataset_kwargs)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 871, in download
    self.dataset = datasets.load_dataset(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 620, in _create_builder_config
    builder_config._resolve_data_files(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 211, in _resolve_data_files
    self.data_files = self.data_files.resolve(base_path, download_config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 799, in resolve
    out[key] = data_files_patterns_list.resolve(base_path, download_config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 752, in resolve
    resolve_pattern(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 393, in resolve_pattern
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to find 'hf://datasets/math_qa@fafb9f7ee5b9ec4da9499f9c4177a4c91389f2d6/default/train/0000.parquet' with any supported extension ['.csv', '.tsv', '.json', '.jsonl', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib', '.h5', '.hdf', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns', '.ico', '.im', '.iim', '.tif', '.tiff', '.jfif', '.jpe', '.jpg', '.jpeg', '.mpg', '.mpeg', '.msp', '.pcd', '.pxr', '.pbm', '.pgm', '.ppm', '.pnm', '.psd', '.bw', '.rgb', '.rgba', '.sgi', '.ras', '.tga', '.icb', '.vda', '.vst', '.webp', '.wmf', '.emf', '.xbm', '.xpm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.H5', '.HDF', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK', '.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.zip']

I have the same version as in CI. Are you able to run the test?

tutorials/evaluation.md

tests/test_evaluate.py

rasbt · 2024-04-03T15:28:13Z

I'm trying this out locally but It fails for me on:

Hm, weird, it works fine for me locally. Let me try in a fresh environment, maybe it's an lm_eval version issue

rasbt · 2024-04-03T15:52:49Z

I totally messed up the rebasing, sry

carmocca · 2024-04-03T16:11:12Z

Do you have a backup of the state? It seems like you overwrote most of the previous changes. If not, I should have one. I can force push it again

I would suggest not using rebase inside PRs since it's easier to merge and the commits will be squashed anyways when the PR lands

rasbt · 2024-04-03T16:49:41Z

If you have a backup, that would be safer and nice. I don't know what happened, have done this successfully several times before, but I must have totally messed something up

carmocca · 2024-04-03T17:03:37Z

Done. You'll need to redo any of the last commits you did before your force merge. I think this was only a "cleanup evaluation docs" removing one section

rasbt · 2024-04-03T19:35:17Z

Done. You'll need to redo any of the last commits you did before your force merge. I think this was only a "cleanup evaluation docs" removing one section

Thanks a lot!! And done!!

Actually, I rewrote that part regading --force conversion in the docs, because if users don't know about it, there can be a common gotcha where people update a checkpoint but find that it has the same evaluation performance as before due to the litgpt temp files.

litgpt/eval/evaluate.py

litgpt evaluate command

375e99e

rasbt requested review from awaelchli, carmocca and lantiga as code owners March 21, 2024 20:39

rasbt added 5 commits March 21, 2024 20:43

update package dependench

0c53da1

add llm-eval dependency

669ce22

move imports

d161d12

update cli test

e7ebfbc

cleanup

4660507

eval unit test

018cc89

rasbt added 3 commits March 22, 2024 15:58

run tests on cpu

98130f9

Add lm-eval to test dependencies

a549535

bump version

7ff3ff2

awaelchli reviewed Mar 23, 2024

View reviewed changes

rasbt and others added 9 commits March 25, 2024 09:09

Update litgpt/scripts/evaluate.py

c47e764

Co-authored-by: awaelchli <[email protected]>

Update litgpt/scripts/evaluate.py

042d2a5

Co-authored-by: awaelchli <[email protected]>

Update litgpt/scripts/evaluate.py

359dad5

Co-authored-by: awaelchli <[email protected]>

Merge branch 'main' into litgpt-eval

9bbc5cc

make args required

f7147c4

automatically infer repo_id

0786285

check out_dir defaults

b54095d

move evaluate.py

4c77a6a

Merge branch 'main' into litgpt-eval

223eb95

carmocca added 2 commits March 26, 2024 17:55

Deps

96d8229

Extra file

9d9ef7c

carmocca reviewed Mar 26, 2024

View reviewed changes

finally fixed

a881630

Merge branch 'main' into litgpt-eval

1ca218b

awaelchli approved these changes Apr 2, 2024

View reviewed changes

tutorials/images/0_to_litgpt/pretrain.webp Outdated Show resolved Hide resolved

awaelchli requested a review from carmocca April 2, 2024 18:44

rasbt and others added 3 commits April 2, 2024 16:20

add new pretrain image

012ad9b

Merge branch 'main' into litgpt-eval

8c55ca1

Parametrize CLI test

9b381c1

carmocca reviewed Apr 3, 2024

View reviewed changes

tutorials/evaluation.md Outdated Show resolved Hide resolved

tutorials/evaluation.md Outdated Show resolved Hide resolved

Minor fixes

b53b688

rasbt commented Apr 3, 2024

View reviewed changes

tests/test_evaluate.py Show resolved Hide resolved

rasbt force-pushed the litgpt-eval branch 2 times, most recently from b86ddd2 to af7ea68 Compare April 3, 2024 15:51

rasbt force-pushed the litgpt-eval branch from af7ea68 to 359dad5 Compare April 3, 2024 15:58

carmocca force-pushed the litgpt-eval branch from e6f8dc3 to b53b688 Compare April 3, 2024 17:02

Merge branch 'main' into litgpt-eval

6cc84ab

Update evaluation.md

887ff61

Merge branch 'main' into litgpt-eval

6e9e238

carmocca approved these changes Apr 4, 2024

View reviewed changes

litgpt/eval/evaluate.py Outdated Show resolved Hide resolved

litgpt/eval/evaluate.py Outdated Show resolved Hide resolved

carmocca added 2 commits April 4, 2024 18:27

Apply suggestions from code review

5a944d2

Update tutorials/evaluation.md

efb6ca4

carmocca merged commit 3822e37 into main Apr 4, 2024
5 of 8 checks passed

carmocca deleted the litgpt-eval branch April 4, 2024 16:34

carmocca mentioned this pull request Apr 4, 2024

Avoid the intermediate save step in evaluate #1249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `litgpt evaluate` command #1177

Add `litgpt evaluate` command #1177

rasbt commented Mar 21, 2024 •

edited

Loading

rasbt commented Mar 21, 2024

carmocca commented Mar 22, 2024

rasbt commented Mar 22, 2024

rasbt commented Mar 25, 2024

rasbt commented Mar 30, 2024

carmocca left a comment

rasbt commented Apr 3, 2024

rasbt commented Apr 3, 2024

carmocca commented Apr 3, 2024

rasbt commented Apr 3, 2024

carmocca commented Apr 3, 2024

rasbt commented Apr 3, 2024

Add litgpt evaluate command #1177

Add litgpt evaluate command #1177

Conversation

rasbt commented Mar 21, 2024 • edited Loading

rasbt commented Mar 21, 2024

carmocca commented Mar 22, 2024

rasbt commented Mar 22, 2024

rasbt commented Mar 25, 2024

rasbt commented Mar 30, 2024

carmocca left a comment

Choose a reason for hiding this comment

rasbt commented Apr 3, 2024

rasbt commented Apr 3, 2024

carmocca commented Apr 3, 2024

rasbt commented Apr 3, 2024

carmocca commented Apr 3, 2024

rasbt commented Apr 3, 2024

Add `litgpt evaluate` command #1177

Add `litgpt evaluate` command #1177

rasbt commented Mar 21, 2024 •

edited

Loading