Why ZeRO-offload parameter update after model backward? can do them pipeline? #5478
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @Zijie-Tian, what is U1234 here? I guess it is step on CPU side. The main reason is because CPU compute is super slow compared with GPU. In your pipeline case, the first needed F1/P1 will be the last updated params (wait until u4321 all finished) on CPU thus have the longest delay. Therefore, if doing such pipeline, CPU will be the bottleneck of the whole training pipeline. Because of this we also did some opitmizations of delaying 1 iteration param updates as described in paper https://arxiv.org/pdf/2101.06840, section 5
Hope it helps |
Beta Was this translation helpful? Give feedback.
Hi @Zijie-Tian, what is U1234 here? I guess it is step on CPU side.
The main reason is because CPU compute is super slow compared with GPU. In your pipeline case, the first needed F1/P1 will be the last updated params (wait until u4321 all finished) on CPU thus have the longest delay. Therefore, if doing such pipeline, CPU will be the bottleneck of the whole training pipeline.
Because of this we also did some opitmizations of delaying 1 iteration param updates as described in paper https://arxiv.org/pdf/2101.06840, section 5