OpenCompass v0.1.4
OpenCompass v0.1.4 is here with an array of features, documentation improvements, and key fixes! Dive in to see what's in store:
🆕 Highlights:
More Tools and Features: OpenCompass continues to expand its repertoire with the addition of tools like update suffix, codellama, preds collection tools, qwen & qwen-chat support, and more. Not forgetting our attention to Otter and the MMBench Evaluation!
Documentation Facelift: We've made several updates to our documentation, ensuring it stays relevant, user-friendly, and aesthetically pleasing.
Essential Bug Fixes: We’ve tackled numerous bugs, especially those concerning tokens, triviaqa, nq postprocess, and qwen config.
Enhancements: From simplifying execution logic to suppressing warnings, we’re always on the lookout for ways to improve our product.
Dive deeper to learn more:
🌟 New Features:
📦 Tools and Integrations:
- Application of update suffix tool (#280).
- Support for codellama and preds collection tools (#335).
- Addition of qwen & qwen-chat support (#286).
- Introduction of Otter to OpenCompass MMBench Evaluation (#232).
- Support for LLaVA and mPLUG-Owl (#331).
🛠 Utilities and Functionality:
- Enhanced sample count in prompt_viewer (#273).
- Ignored ZeroRetriever error when id_list provided (#340).
- Improved default task size (#360).
📝 Documentation:
- Updated communication channels: WeChat and Discord (#328).
- Documentation theme revamped for a fresh look (#332).
- Detailed documentation for the new entry script (#246).
- MMBench documentation updated (#336).
🛠️ Bug Fixes:
- Resolved issue when missing both pad and eos token (#287).
- Addressed triviaqa & nq postprocess glitches (#350).
- Fixed qwen configuration inaccuracies (#358).
- Default value added for zero retriever (#361).
⚙ Enhancements and Refactors:
- Streamlined execution logic in run.py and ensured temp files cleanup (#337).
- Suppressed unnecessary warnings raised by get_logger (#353).
- Import checks of multimodal added (#352).
🎉 New Contributors:
Thank you to all our contributors for this release, with a special shoutout to our new contributors:
@Luodian (First PR)
@ZhangYuanhan-AI (First PR)
@HAOCHENYE (First PR)
Thank you to the entire community for pushing OpenCompass forward. Make sure to star 🌟 our GitHub repository if OpenCompass aids your endeavors! We treasure your feedback and contributions.
Changelog
- [Feature] Add and apply update suffix tool by @Leymore in #280
- support sample count in prompt_viewer by @cdpath in #273
- docs: update wechat and discord by @vansin in #328
- [Docs] Update doc theme by @gaotongxiao in #332
- [Feat] support codellama and preds collection tools by @yingfhu in #335
- [Feature] Add qwen & qwen-chat support by @Leymore in #286
- [Feat] Add Otter to OpenCompass MMBench Evaluation by @Luodian in #232
- [Docs] Update docs for new entry script by @gaotongxiao in #246
- [Fix] Fix when missing both pad and eos token by @Leymore in #287
- [Doc] Update MMBench.md by @kennymckormick in #336
- [Feat] Support LLaVA and mPLUG-Owl by @ZhangYuanhan-AI in #331
- [Feature] Ignore ZeroRetriever error when id_list provided by @Leymore in #340
- [Enhance] Add import check of multimodal by @fangyixiao18 in #352
- [Sync] [Enhancement] Simplify execution logic in run.py; use finally to clean up temp files by @gaotongxiao in #337
- [Fix] Fix triviaqa & nq postprocess by @Leymore in #350
- [Enhance] Supress warning raised by get_logger by @HAOCHENYE in #353
- [Fix] Update qwen config by @Leymore in #358
- [Fix] zero retriever add default value by @Leymore in #361
- [Enhancement] Increase default task size by @gaotongxiao in #360
- [Fix] Quick lint fix by @Leymore in #362
- [Docs] update code evaluator docs by @yingfhu in #354
- [Feat] support wizardcoder series by @yingfhu in #344
- [Feat] Support Qwen-VL-Chat on MMBench. by @yyk-wew in #312
- [Feature] Update claude2 postprocessor by @gaotongxiao in #365
- [Doc] Update Overview by @tonysy in #242
- [Feat] Update URL by @tonysy in #368
- [Feature] Update llama2 implement by @Leymore in #372
- [Feature] Add open source dataset eval config of instruct-blip by @fangyixiao18 in #370
- [Fix] Update bbh implement & Fix bbh suffix by @Leymore in #371
- [Feaure] Add new models: baichuan2, tigerbot, vicuna v1.5 by @Leymore in #373
- Bump version to 0.1.4 by @gaotongxiao in #367
For an exhaustive list of changes, kindly check our Full Changelog.