-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Does TorchRec supports dist checking point / (DCP) #2534
Comments
|
Thansk for your reply! close it as it's completed |
Sorry @iamzainhuda , I have to reopen it because I encountered another issue regarding Adam optimizer. Say if I have an embeddingcollection whose optimizer is fused with backwawrd. But the optimizer.state_dict() returns nothing but only "mometum1/2" tensor, other state like lr, decay are gone. I think the problem is here.
Above model gives me optimizer state like:
|
Hi, team, I would like to know how to load and dump a sharded embedding collection via
state_dict
. BasicallyHow many files should I save? Should each rank have an exclusive sharding file or only single rank collectively gather the whole embedding and stores as one file? How should I handle the case where both DP and MP are applied?
If each rank maintains a sharding file, how can I load and re-shard in a new distributed environment where the number of GPUs vary from the saved model.
If there is one saved file, how should I load and re-shard especially in multi-node env?
It's more helpful if anyone can provide a sample code! Thanks!
The text was updated successfully, but these errors were encountered: