-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unrecoverable cudaMemcpy error #112
Comments
I managed to fix this problem for me:
to that:
In order to avoid the In addition, i found that the source of the NaN values is in the
This will slow the code a bit, so I only use it if the original |
I get the following output (using
verbosity=3
) when running kmeans_cuda from python on a certain input (attached here):There are 14641 vectors, and their dimension is 64, trying to get 292 clusters. I'm using the default
yinyang_t=0.1
. If I reduce it toyinyang_t=0.01
the function succeeds, with only a singledist_sum is NaN
error for step 1.This would have been fine if I could wrap the function call with try-except, but unfortunately after the first failure there is probably some memory error, and running the code again with
yinyang_t=0.01
results in:And I need to restart python again.
I'm using ubuntu 20.04 and RTX 2080Ti, and compiled the library using CUDA_ARCH=75.
The errors can be reproduced using the attached file and the following code:
I tried to look at the code and figure out where the NaNs come from (my data has no NaNs in it), but couldn't find the source of the problem. I also didn't find a way to handle this problem in a recoverable way.
kmeans_input.zip
The text was updated successfully, but these errors were encountered: