Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staged batchsize training #80

Open
ClashLuke opened this issue Sep 11, 2022 · 0 comments
Open

Staged batchsize training #80

ClashLuke opened this issue Sep 11, 2022 · 0 comments
Labels
core Improves core model while keeping core idea intact engineering Software-engineering problems that don't require ML-Expertise research Creative project that might fail but could give high returns

Comments

@ClashLuke
Copy link
Member

Some papers such as "Don't Decay the Learning Rate, Increase the Batch Size" have shown that training with progressively larger batch sizes instead of progressively lower learning rates helps models find a better local minimum by improving stability in the final stages of training. Additionally, this increases training speed, as the model gets progressively faster (in tokens/s) with increasing batch size.
Intuitively, this allows the model to take many small updates initially, as all samples in the batch will point in a similar direction. However, during later stages of the training, the gradients might point in different directions, so larger batches (or lower learning rates) are required.

@ClashLuke ClashLuke added research Creative project that might fail but could give high returns engineering Software-engineering problems that don't require ML-Expertise core Improves core model while keeping core idea intact labels Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Improves core model while keeping core idea intact engineering Software-engineering problems that don't require ML-Expertise research Creative project that might fail but could give high returns
Projects
None yet
Development

No branches or pull requests

1 participant