Skip to content

Latest commit

 

History

History
28 lines (25 loc) · 2.85 KB

README.md

File metadata and controls

28 lines (25 loc) · 2.85 KB

Long-Context LLM Benchmarks

🚀 A List of Long-Context LLM Benchmarks. Better view at here.

Dataset Release Date Type Domain Token Length Language Data Released? Answer Released
ZeroSCROLLS 2023-05 Realistic Novel Report Meetings TV Wikipedia Avg ~15k EN
L-Eval 2023-07 Realistic Math Code Paper e.t.c ACL'24 Outstanding Avg ~ 15k ZH
LongBench 2023-08 Realistic Code Meeting Wiki Novel Avg ~13k ZH EN
BAMBOO 2023-09 Realistic Paper TVshows GovReport Code Meeting Only 4k, 16k EN
LooGLE 2023-11 Realistic Paper Wikipedia TV&Movie Avg ~24K EN
LVEval 2024-02 Realistic Mixup 16 32 64 128 256k ZH EN
InfiniteBench 2024-02 Realistic Code Novel Math Dialogue > 100k ZH EN
DocFInQA 2024-02 Realistic Finance > 100k EN
Counting-Stars 2024-03 Needle Essay Novel Any ZH EN
ClongEval 2024-03 Realistic Story News Conversation < 100k ZH
NovelQA 2024-03 Realistic Novel > 100 k EN
RULER 2024-04 Needle Essays Any EN
XL2Bench 2024-04 Realistic Novel Paper Law > 100k ZH EN
babilong 2024-06 Needle Books Any EN
MedOdyssey 2024-06 Realistic Needle Medical 40k-180K ZH EN
Loong 2024-06 Realistic Papers Legal Finance 40k-230k ZH EN
LongIns 2024-06 Other Multible QA 256 - 16k EN
NOCHA 2024-07 Realistic Novel > 100k EN
[SummaryStack][https://arxiv.org/abs/2407.01370] 2024-07 Other News Conversations Avg ~92k EN
NeedleBench 2024-07 Needle Essays Any ZH EN
ML-Needle 2024-08 Needle Wikipedia 4K-32K ZH EN SP GR AR VT