Language-Model Evaluation #21

ClashLuke · 2022-04-30T10:33:20Z

At the moment, we only have language-modelling loss to go by when experimenting with different architectures. Unfortunately, many methods, such as extra-gradient methods, different loss functions, different tokenisers or even different datasets, will change these loss values dramatically, making comparison almost impossible. We would gain certainty by integrating a dedicated evaluation pipeline such as EleutherAI's eval-harness that one model is better than the other and allow us to compare ourselves with existing models such as GPT-J and GPT-3.

ClashLuke added engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly) labels Apr 30, 2022

ClashLuke added the downstream Changes code wrapping the core model label May 8, 2022

ClashLuke added this to the First Release milestone May 9, 2022

ClashLuke mentioned this issue May 15, 2022

Alternative Sampling Methods #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language-Model Evaluation #21

Language-Model Evaluation #21

ClashLuke commented Apr 30, 2022

Language-Model Evaluation #21

Language-Model Evaluation #21

Comments

ClashLuke commented Apr 30, 2022