We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我尝试在多机多卡上训练,发现耗时相比单机上要增加很多,想同训练环境下相比Deepspeed Ulysses耗时增加了三倍,而单机上却没有这个问题,请问是什么原因导致的呢?
The text was updated successfully, but these errors were encountered:
是因为每次通信都会跨机器,而跨机器之间的通信比较慢?
Sorry, something went wrong.
应该是的。然后比 Deepspeed Ulysses 慢一些也是合理的(慢 3 倍感觉有点多了...),因为 Deepspeed Ulysses 的计算更均衡,通信也更整(不过 Deepspeed Ulysses 受限于模型的 num head,不能扩展足够长的 context length)
mark。 当前的版本是否支持单机(即node)为单位运行zig_zag_ring_attention,多机器之前只进行梯度的通讯,以缓解多机通讯代价太大的问题?当然支持的最长长度也就变短了。
No branches or pull requests
我尝试在多机多卡上训练,发现耗时相比单机上要增加很多,想同训练环境下相比Deepspeed Ulysses耗时增加了三倍,而单机上却没有这个问题,请问是什么原因导致的呢?
The text was updated successfully, but these errors were encountered: