Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: v2.0.0 #1433

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

BREAKING: v2.0.0 #1433

wants to merge 19 commits into from

Conversation

KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Nov 11, 2024

This is a work-in-progress branch which will be the release of MTEB v2.0.0!

Features:

@x-tabdeveloping, @orionw, @isaac-chung, @Samoed, @gowitheflow-1998 etc. please make PR to this when relevant (MIEB still goes it its own branch but will try to merge it in here)

orionw and others added 5 commits November 13, 2024 11:30
* update

* merged retrieval; working

* update tasks; working multilingual

* everything working except instructions

* working instructions; just need cleanup

* add metadata for all but MindSmall

* faster evaluation; mindsmall can compute in reasonable time

* fix bad merge of docs

* lint

* fix test

* qa

* updated mindsmall

* lint

* fix debug

* Update mteb/abstasks/dataloaders.py

Co-authored-by: Roman Solomatin <[email protected]>

* lint

---------

Co-authored-by: Roman Solomatin <[email protected]>
Samoed and others added 13 commits November 14, 2024 21:26
* fix: Count unique texts, data leaks in calculate metrics (#1438)

* add more stat

* add more stat

* update statistics

* fix: update task metadata to allow for null (#1448)

* Update tasks table

* 1.19.5

Automatically generated by python-semantic-release

* base

* sync with main

---------

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
* enable codecarbon by default

* lint

* update flag

* add allow_multiple_runs param

* make lint

* add warning

* lint

* negate the flag

---------

Co-authored-by: Isaac Chung <[email protected]>
* run tasks

* remove test script

* lint

* remove cache

* fix sickbrsts

* fix tests

* add datasets
* fix test

* skip mock

* add message to assert

* fix test

* lint

* fix tests

* upd tests

* update descriptive stats files

* add stat to speed
* multilingual loader

* lint
* add citations

* fix typo
* add code for comupting number of qrels

* add stats fever hotpotqa msmarco topiocqa

* miracl mrtidy

* multilongdoc  miracl reranking

* add multi eurlex

* fix tests for descriptive stats

* fix tests

---------

Co-authored-by: Roman Solomatin <[email protected]>
* add code for comupting number of qrels

* BibleNLPBitextMining descriptive stats added

* SwissJudgementClassification descriptive stats added

* VoyageMMarcoReranking descriptive stats added

* WebLINXCandidatesReranking descriptive stats added

* MultiEURLEXMultilabelClassification descriptive stats added

* MIRACLReranking descriptive stats added

* MindSmallReranking descriptive stats added

* updated test_TaskMetadata

* fix test

---------

Co-authored-by: Imene Kerboua <[email protected]>
Co-authored-by: Imene Kerboua <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
* fix bright loader

* lint

* fix comment
* fix: Count unique texts, data leaks in calculate metrics (#1438)

* add more stat

* add more stat

* update statistics

* fix: update task metadata to allow for null (#1448)

* Update tasks table

* 1.19.5

Automatically generated by python-semantic-release

* Fix: Made data parsing in the leaderboard figure more robust (#1450)

Bugfixes with data parsing in main figure

* Fixed task loading (#1451)

* Fixed task result loading from disk

* Fixed task result loading from disk

* fix: publish (#1452)

* 1.19.6

Automatically generated by python-semantic-release

* fix: Fix load external results with `None` mteb_version (#1453)

* fix

* lint

* 1.19.7

Automatically generated by python-semantic-release

* WIP: Polishing up leaderboard UI (#1461)

* fix: Removed column wrapping on the table, so that it remains readable

* Added disclaimer to figure

* fix: Added links to task info table, switched out license with metric

* fix: loading pre 1.11.0 (#1460)

* small fix

* fix: fix

* 1.19.8

Automatically generated by python-semantic-release

* fix: swap touche2020 to maintain compatibility (#1469)

swap touche2020 for parity

* 1.19.9

Automatically generated by python-semantic-release

* docs: Add sum per language for task counts (#1468)

* add sum per lang

* add sort by sum option

* make lint

* fix: pinned datasets to <3.0.0 (#1470)

* 1.19.10

Automatically generated by python-semantic-release

* feat: add CUREv1 retrieval dataset (#1459)

* feat: add CUREv1 dataset

---------

Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>

* feat: add missing domains to medical tasks

* feat: modify benchmark tasks

* chore: benchmark naming

---------

Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>

* Update tasks table

* 1.20.0

Automatically generated by python-semantic-release

* fix: check if `model` attr of model exists (#1499)

* check if model attr of model exists

* lint

* Fix retrieval evaluator

* 1.20.1

Automatically generated by python-semantic-release

* add cure statistics

---------

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
* fix bright loader

* lint

* fix comment

* fix stats

* fix retrieval stats

* update stats

* add rest of the stat

* move bach code

* fix docs

* lint
* fix FilipinoHateSpeechClassification

* update tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants