BREAKING: v2.0.0 #1433

KennethEnevoldsen · 2024-11-11T09:24:01Z

This is a work-in-progress branch which will be the release of MTEB v2.0.0!

Features:

Added evaluation of image embedding (MIEB, not merged in yet)
Improved handling of seeds (can still be improved by Avoid using global seeds #942)
Major updates to the leaderboard
Evaluators ambiguity: class/module #1124
New benchmark interface #1272
Remove encode_corpus and encode_queries and implement a "document" class #1284
Consolidate Retrieval/Reranking/Instruction Variants #1359

@x-tabdeveloping, @orionw, @isaac-chung, @Samoed, @gowitheflow-1998 etc. please make PR to this when relevant (MIEB still goes it its own branch but will try to merge it in here)

* update * merged retrieval; working * update tasks; working multilingual * everything working except instructions * working instructions; just need cleanup * add metadata for all but MindSmall * faster evaluation; mindsmall can compute in reasonable time * fix bad merge of docs * lint * fix test * qa * updated mindsmall * lint * fix debug * Update mteb/abstasks/dataloaders.py Co-authored-by: Roman Solomatin <[email protected]> * lint --------- Co-authored-by: Roman Solomatin <[email protected]>

…into v2.0.0

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * base * sync with main --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]>

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <[email protected]>

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

* multilingual loader * lint

* add citations * fix typo

* add code for comupting number of qrels * add stats fever hotpotqa msmarco topiocqa * miracl mrtidy * multilongdoc miracl reranking * add multi eurlex * fix tests for descriptive stats * fix tests --------- Co-authored-by: Roman Solomatin <[email protected]>

* add code for comupting number of qrels * BibleNLPBitextMining descriptive stats added * SwissJudgementClassification descriptive stats added * VoyageMMarcoReranking descriptive stats added * WebLINXCandidatesReranking descriptive stats added * MultiEURLEXMultilabelClassification descriptive stats added * MIRACLReranking descriptive stats added * MindSmallReranking descriptive stats added * updated test_TaskMetadata * fix test --------- Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Roman Solomatin <[email protected]>

* fix bright loader * lint * fix comment

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]> Co-authored-by: Daniel Buades Marcos <[email protected]> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * add cure statistics --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: Márton Kardos <[email protected]> Co-authored-by: Isaac Chung <[email protected]> Co-authored-by: Napuh <[email protected]> Co-authored-by: Daniel Buades Marcos <[email protected]> Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]>

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

* fix FilipinoHateSpeechClassification * update tests

fix: Ensure seed is based on RNG State (#1193)

e2520df

KennethEnevoldsen added this to the v2.0.0 milestone Nov 11, 2024

isaac-chung marked this pull request as draft November 11, 2024 09:27

KennethEnevoldsen mentioned this pull request Nov 13, 2024

Consolidate Retrieval/Reranking/Instruction Variants #1359

Merged

1 task

orionw and others added 5 commits November 13, 2024 11:30

fix: Unsure TaskResults can handle runtime and version being unspecified

2a8a370

Merge branch 'v2.0.0' of https://github.com/embeddings-benchmark/mteb …

dea2b77

…into v2.0.0

fix: remove NaN handling for retrieval

23d6cb2

Merge branch 'main' into v2.0.0

8868cd4

Samoed mentioned this pull request Nov 14, 2024

fix: Count unique texts, data leaks in calculate metrics #1438

Merged

2 tasks

Samoed and others added 13 commits November 14, 2024 21:26

feat: enable codecarbon by default (#1428)

70a3ff2

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <[email protected]>

Add decriptive stat almost to all datasets (#1466)

0e9b6fd

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

fix: Fix test for empty descriptive tasks (#1413)

0a5bedb

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

fix: pin datasets version <3.0.0 (#1471)

6da2a1a

feat: Multilingual retrieval loader (#1473)

a27de33

* multilingual loader * lint

fix: add citations to ModelMeta (#1477)

0df0210

* add citations * fix typo

fix: Fix BrightRetrieval calculate stats (#1484)

99247b2

* fix bright loader * lint * fix comment

Fix: retrieval stats (#1496)

6383950

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

fix: hatespeech filipino (#1522)

d54fb75

* fix FilipinoHateSpeechClassification * update tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING: v2.0.0 #1433

BREAKING: v2.0.0 #1433

KennethEnevoldsen commented Nov 11, 2024 •

edited by orionw

Loading

BREAKING: v2.0.0 #1433

Are you sure you want to change the base?

BREAKING: v2.0.0 #1433

Conversation

KennethEnevoldsen commented Nov 11, 2024 • edited by orionw Loading

KennethEnevoldsen commented Nov 11, 2024 •

edited by orionw

Loading