Release v0.2.4 add `generate_until_multi_round` to support interative and multi-round evaluations; add models and fix glitches · EvolvingLMMs-Lab/lmms-eval

What's Changed

[Fix] Fix bugs in returning result dict and bring back anls metric by @kcz358 in #221
fix: fix wrong args in wandb logger by @Luodian in #226
[feat] Add check for existence of accelerator before waiting by @Luodian in #227
add more language tasks and fix fewshot evaluation bugs by @Luodian in #228
Remove unnecessary LM object removal in evaluator by @Luodian in #229
[fix] Shallow copy issue by @pufanyi in #231
[Minor] Fix max_new_tokens in video llava by @kcz358 in #237
Update LMMS evaluation tasks for various subjects by @Luodian in #240
[Fix] Fix async append result in different order issue by @kcz358 in #244
Update the version requirement for transformers by @zhijian-liu in #235
Add new LMMS evaluation task for wild vision benchmark by @Luodian in #247
Add raw score to wildvision bench by @Luodian in #250
[Fix] Strict video to be single processing by @kcz358 in #246
Refactor wild_vision_aggregation_raw_scores to calculate average score by @Luodian in #252
[Fix] Bring back process result pbar by @kcz358 in #251
[Minor] Update utils.py by @YangYangGirl in #249
Refactor distributed gathering of logged samples and metrics by @Luodian in #253
Refactor caching module and fix serialization issue by @Luodian in #255
[Minor] Bring back fix for metadata by @kcz358 in #258
[Model] support minimonkey model by @white2018 in #257
[Feat] add regression test and change saving logic related to output_path by @Luodian in #259
[Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt by @kcz358 in #260
[Model] support cogvlm2 model by @white2018 in #261
[Docs] Update and sort current_tasks.md by @pbcong in #262
fix error name with infovqa task by @ZhaoyangLi-nju in #265
[Task] Add MMT and MMT_MI (Multiple Image) Task by @ngquangtrung57 in #270
mme-realworld by @yfzhang114 in #266
[Model] support Qwen2 VL by @abzb1 in #268
Support new task mmworld by @jkooy in #269
Update current tasks.md by @pbcong in #272
[feat] support video evaluation for qwen2-vl and add mix-evals-video2text by @Luodian in #275
[Feat][Task] Add multi-round evaluation in llava-onevision; Add MMSearch Benchmark by @CaraJ7 in #277
[Fix] Model name None in Task manager, mix eval model specific kwargs, claude retrying fix by @kcz358 in #278
[Feat] Add support for evaluation of Oryx models by @dongyh20 in #276
[Fix] Fix the error when running models caused by generate_until_multi_round by @pufanyi in #281
[fix] Refactor GeminiAPI class to add video pooling and freeing by @pufanyi in #287
add jmmmu by @AtsuMiyai in #286
[Feat] Add support for evaluation of InternVideo2-Chat && Fix evaluation for mvbench by @yinanhe in #280

New Contributors

@YangYangGirl made their first contribution in #249
@white2018 made their first contribution in #257
@pbcong made their first contribution in #262
@ZhaoyangLi-nju made their first contribution in #265
@ngquangtrung57 made their first contribution in #270
@yfzhang114 made their first contribution in #266
@jkooy made their first contribution in #269
@dongyh20 made their first contribution in #276
@yinanhe made their first contribution in #280

Full Changelog: v0.2.3...v0.2.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.4 add `generate_until_multi_round` to support interative and multi-round evaluations; add models and fix glitches

What's Changed

New Contributors

Contributors