-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual memory usage is too large #25
Comments
I don't know the actual reason for your situation. But there are several points may cause the problem in my view:
The 32GB memory (>2x1080Ti) is enough for config file (celeba-hq256.yaml), so I am superised to hear that it raise OUT OF MEMORY fault. Hope you reproduce the results successfully soon and you are welcomed to share more information or solutions here. I will try my best to help you. |
Thank you for such a quick reply, I will check what you mentioned. |
I switched to an Ubuntu system under the same configuration, and there were no problems during training. I think the reason for the above problem is the different virtual memory allocation mechanism between Linux and Windows. Thank you again for your help! |
Glad to hear that and you're always welcomed if there are any further problems. |
Hello, the the model looks very good, but I encountered some problems when trying to train the model by myself.
My training environment is win10, torch1.8.0+cuda11, rtx3090, 32g memory.
When I use the default config, it will prompt cuda out of memory, but this is a misleading error message, cause there is still a lot of free cuda memory. When I tracked the hardware resources during the training process, I found that the amount of memory submitted before the formal training began to increase, which eventually led to overflow, which means that a huge amount of virtual memory was applied for before the training began. However, when I lower the parameters for normal training, the actual memory usage is very small, and the virtual memory usage is still large, but it will not reach the upper limit that was raised before the training started. I have never seen such a huge virtual memory overhead before, so I consider whether there is a memory leak problem during preload or preprocessing, and whether the program can be better optimized.
Thank you!
The text was updated successfully, but these errors were encountered: