Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Training with PyTorch on Multi-GPU Setup #5

Open
ballin0105 opened this issue Dec 25, 2023 · 3 comments
Open

Distributed Training with PyTorch on Multi-GPU Setup #5

ballin0105 opened this issue Dec 25, 2023 · 3 comments

Comments

@ballin0105
Copy link

Hi there,

I've been following your work with great interest and appreciate all the effort you've put into it. I encountered an issue when running your code on a remote server with 8 V-100 GPUs under PyTorch. After switching the launcher to PyTorch, I ran into a address already in use error that seems to prevent multi-GPU utilization, restricting the process to a single GPU.

Is there any chance that a distributed training update compatible with PyTorch might be on the horizon? It would greatly benefit those of us working with similar hardware configurations.

Thanks for your continued contributions to the field!

@Jingkang50
Copy link
Collaborator

Thank you for your interest in our work! The distributed training framework is based on MMDet, maybe you can look for some solutions at the MMDet repo or community? If you still cannot solve it, please lemme know.
Thanks!

@ballin0105
Copy link
Author

Thank you for your interest in our work! The distributed training framework is based on MMDet, maybe you can look for some solutions at the MMDet repo or community? If you still cannot solve it, please lemme know. Thanks!

Thank you so much for your helpful response! :)

@leaozhun
Copy link

Hello,have you resolved the issue with PyTorch distributed training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants