-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add litgpt evaluate
command
#1177
Conversation
One question is how to add the tests @carmocca because I assume it's very expensive. One could probably test it on the smallest phi-2 model and on a small number of examples e.g.
Still, I don't know if the CI will be able to handle it. I am also not sure if there's a good way to mock this. |
@rasbt I think you should be able to update the tests in https://github.com/Lightning-AI/litgpt/blob/main/tests/test_lm_eval_harness.py#L30. In fact we might want to remove the existing eval implementation |
Thanks, I hope the test I added suffices |
Co-authored-by: awaelchli <[email protected]>
Co-authored-by: awaelchli <[email protected]>
Co-authored-by: awaelchli <[email protected]>
Thanks for all the feedback. These were all great suggestions! The latest updates include
Question 1: Should we make Question 2: I moved it to |
Phew, finally got the test to work. It should be ready to review (and hopefully merge). If you have a moment, I'd appreciate your feedback @awaelchli & @carmocca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying this out locally but It fails for me on:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 135, in <module>
main()
File "/home/carmocca/git/lit-stablelm/litgpt/__main__.py", line 131, in main
fn(**kwargs)
File "/home/carmocca/git/lit-stablelm/litgpt/eval/evaluate.py", line 99, in convert_and_evaluate
results = evaluator.simple_evaluate(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
task_dict = get_task_dict(tasks, task_manager)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
task_name_from_string_dict = task_manager.load_task_or_group(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
return load_task(task_config, task=name_or_config, group=parent_name)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 150, in load_task
task_object = ConfigurableTask(config=config)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 782, in __init__
self.download(self.config.dataset_kwargs)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/lm_eval/api/task.py", line 871, in download
self.dataset = datasets.load_dataset(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
builder_instance = load_dataset_builder(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
self.config, self.config_id = self._create_builder_config(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 620, in _create_builder_config
builder_config._resolve_data_files(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/builder.py", line 211, in _resolve_data_files
self.data_files = self.data_files.resolve(base_path, download_config)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 799, in resolve
out[key] = data_files_patterns_list.resolve(base_path, download_config)
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 752, in resolve
resolve_pattern(
File "/home/carmocca/git/nightly-venv/lib/python3.10/site-packages/datasets/data_files.py", line 393, in resolve_pattern
raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to find 'hf://datasets/math_qa@fafb9f7ee5b9ec4da9499f9c4177a4c91389f2d6/default/train/0000.parquet' with any supported extension ['.csv', '.tsv', '.json', '.jsonl', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib', '.h5', '.hdf', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns', '.ico', '.im', '.iim', '.tif', '.tiff', '.jfif', '.jpe', '.jpg', '.jpeg', '.mpg', '.mpeg', '.msp', '.pcd', '.pxr', '.pbm', '.pgm', '.ppm', '.pnm', '.psd', '.bw', '.rgb', '.rgba', '.sgi', '.ras', '.tga', '.icb', '.vda', '.vst', '.webp', '.wmf', '.emf', '.xbm', '.xpm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.H5', '.HDF', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK', '.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.zip']
I have the same version as in CI. Are you able to run the test?
b86ddd2
to
af7ea68
Compare
I totally messed up the rebasing, sry |
Do you have a backup of the state? It seems like you overwrote most of the previous changes. If not, I should have one. I can force push it again I would suggest not using rebase inside PRs since it's easier to merge and the commits will be squashed anyways when the PR lands |
If you have a backup, that would be safer and nice. I don't know what happened, have done this successfully several times before, but I must have totally messed something up |
Done. You'll need to redo any of the last commits you did before your force merge. I think this was only a "cleanup evaluation docs" removing one section |
Thanks a lot!! And done!! Actually, I rewrote that part regading |
This adds the
litgpt evaluate
command along with a wrapper function that allows someone to evaluate models without needing to convert or copy anything manually. I.e., it's justFixes #1168