Long-Range-Arena Evaluation #49
Labels
downstream
Changes code wrapping the core model
engineering
Software-engineering problems that don't require ML-Expertise
ML
Requires machine-learning knowledge (can be built up on the fly)
Currently, we only know that our model is better than the baseline because of its lower loss at less training time. However, we could run some benchmarks such as LRA to see how well our long-context model performs in a real-world scenario. While LRA doesn't leverage our capabilities ideally (unlike, for example, #5 and #9), it'd still allow us to have preliminary evaluation results on a well-known benchmark dataset.
This issue tacks the progress of integrating our model into LRA, even though it should happen in a separate codebase.
The text was updated successfully, but these errors were encountered: