Time for getting batch #2

zdz1997 · 2021-03-17T02:27:14Z

Thank you for your great work! When I train new model, the speed is so slow.So I print the time of every step.I find that most of time is used to get batch .

train part

image_tensors, labels = train_dataset.get_batch()
above code ,cost so much time
so that causes the utilization rate of my gpus(2 2080ti) is often 0%. I think gpu is waiting for data .

phantrdat · 2021-03-17T07:21:41Z

@zdz1997 Thank you for your great feedback. I will explore carefully and respond you soon.
Btw, if you can improve the training speed, please let me know. Thanks a lot.

w867066886 · 2021-04-26T09:21:22Z

I think it is because the operation that you empty the torch cache. When I delete the code and set the wokers 8, i get utilization rate of gpu(also 2 2080ti) is always 70+. In fact, we can overwrite the image and don't care memory leak.

Mountchicken · 2021-09-13T02:58:27Z

I had suffered the problem of slow training speed before. And it turns out to be the problem of shuffle in DataLoader. There are millions of pictures in MJ+ST, and random reading is slow. So PyTorch spends most of the time reading pictures and causes a low utilization rate of GPU. You can either buy an SSD or use a randomsequentialSampler used in MoranV2. Or simply just turn of the shuffle in dataloader

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time for getting batch #2

Time for getting batch #2

zdz1997 commented Mar 17, 2021

phantrdat commented Mar 17, 2021

w867066886 commented Apr 26, 2021

Mountchicken commented Sep 13, 2021

Time for getting batch #2

Time for getting batch #2

Comments

zdz1997 commented Mar 17, 2021

train part

phantrdat commented Mar 17, 2021

w867066886 commented Apr 26, 2021

Mountchicken commented Sep 13, 2021