-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory error. #135
Comments
You can reduce the batch size a little but I would not recommend using a value smaller than 4 or 5. |
I don't think it's a good idea, I should probably modify the data loader to load the point cloud in tiles. any advice regarding this topic would be wonderful. Thanks for everything Hugues. |
Why wouldn't it be a good idea? I don't see how tiles would solve your issues. Are you using my data loader which picks random spheres in the point clouds? If yes, I can definitely tell you that reducing the sphere radius is a good idea. I often found it improved the performances |
As of today, I've tried training the entire DALES dataset on my 16GB GPU, which went into CUDA memory error in the backward call. afterward tried training just one of the samples, and still got the CUDA memory error, so I assume tuning hyperparameters wouldn't really make a difference with loading the entire set (Correct me if I'm wrong). |
Hello again! @HuguesTHOMAS I've tried running the network on just one ply file of dales, containing about 1 million points, with: And still got the CUDA error on my 16GB GPU. Really appreciate the help, |
What is the value of |
@HuguesTHOMAS it's 0.250 Best Regards, |
ok. This is a fare value, can you try to reduce in_radius to 10.0 and see if there is still a CUDA OOM error? Can you also print the error message that you get? |
@HuguesTHOMAS Sorry for the late response, I ran it now with in_radius = 10.0 as you said, and it works. |
Hardware always dictates which hyperparameter you are using when you do Deep Learning. You can see |
You are absolutely right. I'm just confused, I've seen in Argun's repo that he could train KPConv on the whole DALES dataset with hardware the same as I have, and I can't even train on a single example, (out of 30~ training examples.) though using the same hyper-parameters as he did, maybe he split the data to tiles, although I've looked into his code and didn't see anything like it. So I'm saying - in order to overcome this gap, I'll split the data into tiles, as often done in 2D deep learning, instead of letting the hyper-parameters dictate that i can train just on one example, or train on a shallow network with small in_radius that could cause less good results. |
What I have trouble understanding is how you think that splitting the data into tiles will help you... The input pipeline should already take small sub-spheres in the big DALES point clouds which are the same as tiles except they are not fixed in advance. If you use tiles you will do exactly the same and pick small sub-tiles in the big point clouds, and you will have the same memory issues if the tiles are too big, so you will have to reduce the tile size exactly like you have to reduce the sphere radius right now. Or maybe there is something I did not understand in what you are planning to do? |
Ok. I'll explain; If I'm not wrong, in many domains of deep learning, let's consider for a moment 2D vision tasks, images are cropped into patches and being preprocessed in the CPU, while afterwards being fed one by one to the GPU, considering storage aspects, you ''fill'' the gpu storage wise only one patch at a time. the rest is queued in the CPU. So take that idea here, you load all the data into the CPU, and take each time a subsphere.todevice( ), run it through the net, and so fourth for a specific number of iterations, this way every time you fill the GPU a little, evading the CUDA error. I'm I'm wrong here i'd like to have explanation, I'm not yet to be a Deep Learning expert, but an entusitasic student :) |
Ok so if I understand well, what you are referring to is close to the concept of minibatch, but instead of dividing a batch before sending it to the GPU, you divide your input along its spatial dimensions. I have not heard of many methods doing that during training, so I am not an expert, but I would say that this would raise several issues (border effects, backpropagation compatibility, slow computations...). It would also, in my opinion, be very hard to implement in the KPConv framework, as the convolutions are not as easily defined as in images, and as the batches have variable size. You could end up spending one or two months implementing something like that, and in the end, have a chance that this would not even improve the performances. For all these reasons I would not recommend you follow this idea. That being said, it is your work, your project so you can do whatever you want. Best, |
Cheers Hugues,
I've managed to modify the DataLoader to fit with DALES dataset, however, when I'm training the network, I'm getting:
RuntimeError: CUDA out of memory. Tried to allocate 4.73 GiB (GPU 0; 15.78 GiB total capacity; 7.98 GiB already allocated; 3.54 GiB free; 11.08 GiB reserved in total by PyTorch)
I'm using a 16GB GPU as needed in Arjun's KPConv for dales, I'd like to ask you which hyperparameters i might change to solve this issue, as the one who built this network.
Thank you and Good day.
Eyal
The text was updated successfully, but these errors were encountered: