You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unsure I understand the logic behind accumulate_gradient_steps.
I have these 3 configurations:
batch_size=1, accumulate_gradient_steps=1 -> blue
batch_size=2, accumulate_gradient_steps=1 -> red
batch_size=2, accumulate_gradient_steps=2 -> green
My initial understanding is that when doing grad accumulation, accumulate_gradient_steps forwards + backward steps and then the optimizer takes a step.
1/ I don't see such where that logic is handled in llama_train.py. it looks like in the method train_step, there is no counter for accumulate_gradient_steps and optimizer steps are taken after each forward?
2/ the logging is confusing: I would have expected the red and blue line to be overlapped, not the blue and green.
Is it possible that step is the counter of forward+backward operations and not the counter of (forward+backward) x grad_acc + optimizer_step?
The text was updated successfully, but these errors were encountered:
Hi,
I am unsure I understand the logic behind
accumulate_gradient_steps
.I have these 3 configurations:
My initial understanding is that when doing grad accumulation,
accumulate_gradient_steps
forwards + backward steps and then the optimizer takes a step.1/ I don't see such where that logic is handled in
llama_train.py
. it looks like in the methodtrain_step
, there is no counter foraccumulate_gradient_steps
and optimizer steps are taken after each forward?2/ the logging is confusing: I would have expected the red and blue line to be overlapped, not the blue and green.
Is it possible that
step
is the counter offorward+backward
operations and not the counter of(forward+backward) x grad_acc + optimizer_step
?The text was updated successfully, but these errors were encountered: