Stabilize MoE #16
Labels
core
Improves core model while keeping core idea intact
engineering
Software-engineering problems that don't require ML-Expertise
ML
Requires machine-learning knowledge (can be built up on the fly)
Currently, our MoE implementation leads to exploding losses and the eventual NaN.
This issue is about finding the cause behind these problems and fixing it.
The text was updated successfully, but these errors were encountered: