v0.2.4 add `generate_until_multi_round` to support interative and multi-round evaluations; add models and fix glitches
LatestWhat's Changed
- [Fix] Fix bugs in returning result dict and bring back anls metric by @kcz358 in #221
- fix: fix wrong args in wandb logger by @Luodian in #226
- [feat] Add check for existence of accelerator before waiting by @Luodian in #227
- add more language tasks and fix fewshot evaluation bugs by @Luodian in #228
- Remove unnecessary LM object removal in evaluator by @Luodian in #229
- [fix] Shallow copy issue by @pufanyi in #231
- [Minor] Fix max_new_tokens in video llava by @kcz358 in #237
- Update LMMS evaluation tasks for various subjects by @Luodian in #240
- [Fix] Fix async append result in different order issue by @kcz358 in #244
- Update the version requirement for
transformers
by @zhijian-liu in #235 - Add new LMMS evaluation task for wild vision benchmark by @Luodian in #247
- Add raw score to wildvision bench by @Luodian in #250
- [Fix] Strict video to be single processing by @kcz358 in #246
- Refactor wild_vision_aggregation_raw_scores to calculate average score by @Luodian in #252
- [Fix] Bring back process result pbar by @kcz358 in #251
- [Minor] Update utils.py by @YangYangGirl in #249
- Refactor distributed gathering of logged samples and metrics by @Luodian in #253
- Refactor caching module and fix serialization issue by @Luodian in #255
- [Minor] Bring back fix for metadata by @kcz358 in #258
- [Model] support minimonkey model by @white2018 in #257
- [Feat] add regression test and change saving logic related to
output_path
by @Luodian in #259 - [Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt by @kcz358 in #260
- [Model] support cogvlm2 model by @white2018 in #261
- [Docs] Update and sort current_tasks.md by @pbcong in #262
- fix error name with infovqa task by @ZhaoyangLi-nju in #265
- [Task] Add MMT and MMT_MI (Multiple Image) Task by @ngquangtrung57 in #270
- mme-realworld by @yfzhang114 in #266
- [Model] support Qwen2 VL by @abzb1 in #268
- Support new task mmworld by @jkooy in #269
- Update current tasks.md by @pbcong in #272
- [feat] support video evaluation for qwen2-vl and add mix-evals-video2text by @Luodian in #275
- [Feat][Task] Add multi-round evaluation in llava-onevision; Add MMSearch Benchmark by @CaraJ7 in #277
- [Fix] Model name None in Task manager, mix eval model specific kwargs, claude retrying fix by @kcz358 in #278
- [Feat] Add support for evaluation of Oryx models by @dongyh20 in #276
- [Fix] Fix the error when running models caused by
generate_until_multi_round
by @pufanyi in #281 - [fix] Refactor GeminiAPI class to add video pooling and freeing by @pufanyi in #287
- add jmmmu by @AtsuMiyai in #286
- [Feat] Add support for evaluation of InternVideo2-Chat && Fix evaluation for mvbench by @yinanhe in #280
New Contributors
- @YangYangGirl made their first contribution in #249
- @white2018 made their first contribution in #257
- @pbcong made their first contribution in #262
- @ZhaoyangLi-nju made their first contribution in #265
- @ngquangtrung57 made their first contribution in #270
- @yfzhang114 made their first contribution in #266
- @jkooy made their first contribution in #269
- @dongyh20 made their first contribution in #276
- @yinanhe made their first contribution in #280
Full Changelog: v0.2.3...v0.2.4