Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language-Model Evaluation #21

Open
ClashLuke opened this issue Apr 30, 2022 · 0 comments
Open

Language-Model Evaluation #21

ClashLuke opened this issue Apr 30, 2022 · 0 comments
Labels
downstream Changes code wrapping the core model engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly)
Milestone

Comments

@ClashLuke
Copy link
Member

At the moment, we only have language-modelling loss to go by when experimenting with different architectures. Unfortunately, many methods, such as extra-gradient methods, different loss functions, different tokenisers or even different datasets, will change these loss values dramatically, making comparison almost impossible. We would gain certainty by integrating a dedicated evaluation pipeline such as EleutherAI's eval-harness that one model is better than the other and allow us to compare ourselves with existing models such as GPT-J and GPT-3.

@ClashLuke ClashLuke added engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly) labels Apr 30, 2022
@ClashLuke ClashLuke added the downstream Changes code wrapping the core model label May 8, 2022
@ClashLuke ClashLuke added this to the First Release milestone May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream Changes code wrapping the core model engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly)
Projects
None yet
Development

No branches or pull requests

1 participant