Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory error. #135

Open
eyalol opened this issue Feb 1, 2021 · 14 comments
Open

CUDA out of memory error. #135

eyalol opened this issue Feb 1, 2021 · 14 comments

Comments

@eyalol
Copy link

eyalol commented Feb 1, 2021

Cheers Hugues,
I've managed to modify the DataLoader to fit with DALES dataset, however, when I'm training the network, I'm getting:

RuntimeError: CUDA out of memory. Tried to allocate 4.73 GiB (GPU 0; 15.78 GiB total capacity; 7.98 GiB already allocated; 3.54 GiB free; 11.08 GiB reserved in total by PyTorch)

I'm using a 16GB GPU as needed in Arjun's KPConv for dales, I'd like to ask you which hyperparameters i might change to solve this issue, as the one who built this network.

Thank you and Good day.
Eyal

@HuguesTHOMAS
Copy link
Owner

You can reduce the batch size a little but I would not recommend using a value smaller than 4 or 5.
The other parameter that you can reduce is the input radius or in the same manner, you can increase the first subsampling dl. The number of points in each input sphere is directly controlled by the ratio between input radius and subsampling dl.

@eyalol
Copy link
Author

eyalol commented Feb 1, 2021

You can reduce the batch size a little but I would not recommend using a value smaller than 4 or 5.
The other parameter that you can reduce is the input radius or in the same manner, you can increase the first subsampling dl. The number of points in each input sphere is directly controlled by the ratio between input radius and subsampling dl.

I don't think it's a good idea, I should probably modify the data loader to load the point cloud in tiles. any advice regarding this topic would be wonderful.

Thanks for everything Hugues.

@HuguesTHOMAS
Copy link
Owner

Why wouldn't it be a good idea? I don't see how tiles would solve your issues. Are you using my data loader which picks random spheres in the point clouds? If yes, I can definitely tell you that reducing the sphere radius is a good idea. I often found it improved the performances

@eyalol
Copy link
Author

eyalol commented Feb 1, 2021

As of today, I've tried training the entire DALES dataset on my 16GB GPU, which went into CUDA memory error in the backward call. afterward tried training just one of the samples, and still got the CUDA memory error, so I assume tuning hyperparameters wouldn't really make a difference with loading the entire set (Correct me if I'm wrong).
Thus I can tell that I have severe RAM issue, so splitting to tiles would probably solve this issue, just like patches in images, I
I plan on loading them one by one to the GPU and run the network each time, instead of loading the whole dataset into memory. If I'm not wrong I think you even mentioned this idea in one of the issues regarding the DALES dataset.
Would like to hear what you think.

@eyalol
Copy link
Author

eyalol commented Feb 3, 2021

Hello again! @HuguesTHOMAS I've tried running the network on just one ply file of dales, containing about 1 million points, with:
batch_num = 4
in_radius = 20
num_kernel_points = 15
conv_radius = 2.5
deform_radius = 6

And still got the CUDA error on my 16GB GPU.
Does it makes sense to you? or do you think something's wrong with my GPU?

Really appreciate the help,
Thanks,
Eyal

@HuguesTHOMAS
Copy link
Owner

What is the value of first_subsampling_dl?

@eyalol
Copy link
Author

eyalol commented Feb 3, 2021

@HuguesTHOMAS it's 0.250

Best Regards,
Eyal

@HuguesTHOMAS
Copy link
Owner

ok. This is a fare value, can you try to reduce in_radius to 10.0 and see if there is still a CUDA OOM error?

Can you also print the error message that you get?

@eyalol
Copy link
Author

eyalol commented Feb 12, 2021

@HuguesTHOMAS Sorry for the late response, I ran it now with in_radius = 10.0 as you said, and it works.
Now I should probably need to check how I'm gonna split it to tiles for the training process. Do you agree?
After all I don't want the hardware to dictate which hyper-parameters i can use.

@HuguesTHOMAS
Copy link
Owner

Hardware always dictates which hyperparameter you are using when you do Deep Learning. You can see in_radius as the zoom of an image if you were doing 2D CNNs and first_subsampling_dl would be the resolution of the image. Obviously, you have to adapt these values to your hardware. Or reduce the memory consumption by using a smaller network.

@eyalol
Copy link
Author

eyalol commented Feb 12, 2021

You are absolutely right. I'm just confused, I've seen in Argun's repo that he could train KPConv on the whole DALES dataset with hardware the same as I have, and I can't even train on a single example, (out of 30~ training examples.) though using the same hyper-parameters as he did, maybe he split the data to tiles, although I've looked into his code and didn't see anything like it.

So I'm saying - in order to overcome this gap, I'll split the data into tiles, as often done in 2D deep learning, instead of letting the hyper-parameters dictate that i can train just on one example, or train on a shallow network with small in_radius that could cause less good results.

@HuguesTHOMAS
Copy link
Owner

What I have trouble understanding is how you think that splitting the data into tiles will help you... The input pipeline should already take small sub-spheres in the big DALES point clouds which are the same as tiles except they are not fixed in advance. If you use tiles you will do exactly the same and pick small sub-tiles in the big point clouds, and you will have the same memory issues if the tiles are too big, so you will have to reduce the tile size exactly like you have to reduce the sphere radius right now. Or maybe there is something I did not understand in what you are planning to do?

@eyalol
Copy link
Author

eyalol commented Feb 16, 2021

Ok. I'll explain; If I'm not wrong, in many domains of deep learning, let's consider for a moment 2D vision tasks, images are cropped into patches and being preprocessed in the CPU, while afterwards being fed one by one to the GPU, considering storage aspects, you ''fill'' the gpu storage wise only one patch at a time. the rest is queued in the CPU.

So take that idea here, you load all the data into the CPU, and take each time a subsphere.todevice( ), run it through the net, and so fourth for a specific number of iterations, this way every time you fill the GPU a little, evading the CUDA error.

I'm I'm wrong here i'd like to have explanation, I'm not yet to be a Deep Learning expert, but an entusitasic student :)
Best Wishes!
Eyal

@HuguesTHOMAS
Copy link
Owner

Ok so if I understand well, what you are referring to is close to the concept of minibatch, but instead of dividing a batch before sending it to the GPU, you divide your input along its spatial dimensions. I have not heard of many methods doing that during training, so I am not an expert, but I would say that this would raise several issues (border effects, backpropagation compatibility, slow computations...). It would also, in my opinion, be very hard to implement in the KPConv framework, as the convolutions are not as easily defined as in images, and as the batches have variable size. You could end up spending one or two months implementing something like that, and in the end, have a chance that this would not even improve the performances.

For all these reasons I would not recommend you follow this idea. That being said, it is your work, your project so you can do whatever you want.

Best,
Hugues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants