Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add litgpt evaluate command #1177

Merged
merged 49 commits into from
Apr 4, 2024
Merged

Add litgpt evaluate command #1177

merged 49 commits into from
Apr 4, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Mar 21, 2024

This adds the litgpt evaluate command along with a wrapper function that allows someone to evaluate models without needing to convert or copy anything manually. I.e., it's just

litgpt finetune lora \
  --checkpoint_dir checkpoints/microsoft/phi-2 \
  --out_dir lora_model
litgpt evaluate \
  --checkpoint_dir lora_model/final \
  --out_dir evaluate_model/ \

Fixes #1168

@rasbt
Copy link
Collaborator Author

rasbt commented Mar 21, 2024

One question is how to add the tests @carmocca because I assume it's very expensive. One could probably test

it on the smallest phi-2 model and on a small number of examples e.g.

litgpt evaluate \
  --checkpoint_dir checkpoints/EleutherAI/pythia-14m/ \
  --out_dir out/ \
  --repo_id EleutherAI/pythia-14m \
  --limit 10 \
  --tasks "hellaswag"

Still, I don't know if the CI will be able to handle it. I am also not sure if there's a good way to mock this.

@carmocca
Copy link
Contributor

@rasbt I think you should be able to update the tests in https://github.com/Lightning-AI/litgpt/blob/main/tests/test_lm_eval_harness.py#L30. In fact we might want to remove the existing eval implementation

@rasbt
Copy link
Collaborator Author

rasbt commented Mar 22, 2024

Thanks, I hope the test I added suffices

litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
litgpt/scripts/evaluate.py Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Mar 25, 2024

Thanks for all the feedback. These were all great suggestions!

The latest updates include

  1. Make checkpoint_dir a required argument
  2. Set default output_dir as checkpoint_dir/evaluate
  3. Moved evalute.py out of litgpt/scripts
  4. Automatically infer the repo_id from the config file

Question 1: Should we make output_dir the same as checkpoint_dir by default (evaluate will save the converted model + results.json)?

Question 2: I moved it to litgpt/eval/evaluate.py. Since it's a single module, we could also just make it litgpt/evaluate.py and remove the `eval subfolder. Wdyt?

tutorials/evaluation.md Outdated Show resolved Hide resolved
tutorials/evaluation.md Outdated Show resolved Hide resolved
tutorials/evaluation.md Outdated Show resolved Hide resolved
tests/test_evaluate.py Outdated Show resolved Hide resolved
tests/test_evaluate.py Outdated Show resolved Hide resolved
tests/test_lm_eval_harness.py Show resolved Hide resolved
litgpt/eval/evaluate.py Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Mar 30, 2024

Phew, finally got the test to work. It should be ready to review (and hopefully merge). If you have a moment, I'd appreciate your feedback @awaelchli & @carmocca

tutorials/images/0_to_litgpt/pretrain.webp Outdated Show resolved Hide resolved
@awaelchli awaelchli requested a review from carmocca April 2, 2024 18:44
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying this out locally but It fails for me on:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 135, in <module>
    main()
  File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 131, in main
    fn(**kwargs)
  File "/home/carmocca/git/lit-stablelm/litgpt/eval/evaluate.py", line 99, in convert_and_evaluate
    results = evaluator.simple_evaluate(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
    return load_task(task_config, task=name_or_config, group=parent_name)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 150, in load_task
    task_object = ConfigurableTask(config=config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 782, in __init__
    self.download(self.config.dataset_kwargs)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 871, in download
    self.dataset = datasets.load_dataset(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 620, in _create_builder_config
    builder_config._resolve_data_files(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 211, in _resolve_data_files
    self.data_files = self.data_files.resolve(base_path, download_config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 799, in resolve
    out[key] = data_files_patterns_list.resolve(base_path, download_config)
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 752, in resolve
    resolve_pattern(
  File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 393, in resolve_pattern
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to find 'hf://datasets/math_qa@fafb9f7ee5b9ec4da9499f9c4177a4c91389f2d6/default/train/0000.parquet' with any supported extension ['.csv', '.tsv', '.json', '.jsonl', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib', '.h5', '.hdf', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns', '.ico', '.im', '.iim', '.tif', '.tiff', '.jfif', '.jpe', '.jpg', '.jpeg', '.mpg', '.mpeg', '.msp', '.pcd', '.pxr', '.pbm', '.pgm', '.ppm', '.pnm', '.psd', '.bw', '.rgb', '.rgba', '.sgi', '.ras', '.tga', '.icb', '.vda', '.vst', '.webp', '.wmf', '.emf', '.xbm', '.xpm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.H5', '.HDF', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK', '.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.zip']

I have the same version as in CI. Are you able to run the test?

tutorials/evaluation.md Outdated Show resolved Hide resolved
tutorials/evaluation.md Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Apr 3, 2024

I'm trying this out locally but It fails for me on:

Hm, weird, it works fine for me locally. Let me try in a fresh environment, maybe it's an lm_eval version issue

Screenshot 2024-04-03 at 10 26 58 AM

@rasbt rasbt force-pushed the litgpt-eval branch 2 times, most recently from b86ddd2 to af7ea68 Compare April 3, 2024 15:51
@rasbt
Copy link
Collaborator Author

rasbt commented Apr 3, 2024

I totally messed up the rebasing, sry

@carmocca
Copy link
Contributor

carmocca commented Apr 3, 2024

Do you have a backup of the state? It seems like you overwrote most of the previous changes. If not, I should have one. I can force push it again

I would suggest not using rebase inside PRs since it's easier to merge and the commits will be squashed anyways when the PR lands

@rasbt
Copy link
Collaborator Author

rasbt commented Apr 3, 2024

If you have a backup, that would be safer and nice. I don't know what happened, have done this successfully several times before, but I must have totally messed something up

@carmocca
Copy link
Contributor

carmocca commented Apr 3, 2024

Done. You'll need to redo any of the last commits you did before your force merge. I think this was only a "cleanup evaluation docs" removing one section

@rasbt
Copy link
Collaborator Author

rasbt commented Apr 3, 2024

Done. You'll need to redo any of the last commits you did before your force merge. I think this was only a "cleanup evaluation docs" removing one section

Thanks a lot!! And done!!

Actually, I rewrote that part regading --force conversion in the docs, because if users don't know about it, there can be a common gotcha where people update a checkpoint but find that it has the same evaluation performance as before due to the litgpt temp files.

litgpt/eval/evaluate.py Outdated Show resolved Hide resolved
litgpt/eval/evaluate.py Outdated Show resolved Hide resolved
@carmocca carmocca merged commit 3822e37 into main Apr 4, 2024
5 of 8 checks passed
@carmocca carmocca deleted the litgpt-eval branch April 4, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add litgpt eval CLI
3 participants