Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time for getting batch #2

Open
zdz1997 opened this issue Mar 17, 2021 · 3 comments
Open

Time for getting batch #2

zdz1997 opened this issue Mar 17, 2021 · 3 comments

Comments

@zdz1997
Copy link

zdz1997 commented Mar 17, 2021

Thank you for your great work! When I train new model, the speed is so slow.So I print the time of every step.I find that most of time is used to get batch .

train part

image_tensors, labels = train_dataset.get_batch()
above code ,cost so much time
so that causes the utilization rate of my gpus(2 2080ti) is often 0%. I think gpu is waiting for data .

@phantrdat
Copy link
Owner

@zdz1997 Thank you for your great feedback. I will explore carefully and respond you soon.
Btw, if you can improve the training speed, please let me know. Thanks a lot.

@w867066886
Copy link

I think it is because the operation that you empty the torch cache. When I delete the code and set the wokers 8, i get utilization rate of gpu(also 2 2080ti) is always 70+. In fact, we can overwrite the image and don't care memory leak.

@Mountchicken
Copy link

I had suffered the problem of slow training speed before. And it turns out to be the problem of shuffle in DataLoader. There are millions of pictures in MJ+ST, and random reading is slow. So PyTorch spends most of the time reading pictures and causes a low utilization rate of GPU. You can either buy an SSD or use a randomsequentialSampler used in MoranV2. Or simply just turn of the shuffle in dataloader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants