-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution is stuck until termination when running on CUDA in WSL2 #538
Comments
Is this for any program you try to run? |
Yes, all exmples from
|
I would like to also add that I have this same issue on my laptop running ubuntu linux. I have tried the sorter.bend script from the README on three machines now and I have gotten it to run on my other two (one has a laptop 3080 gpu and the other running dual RTX A4000s) however it won't run on my laptop with a Quadro P600 (cuda version 12.2). It does the same thing @wtfil is describing where it simply hangs. Looking at the processor usage, it seems as though it might be having an issue allocating GPU memory (?). I can see it running 100% on a single cpu core but never load either the RAM or VRAM and there is no processing being run on the GPU either. Here are some specs from the machine in question if this might be a bug to be fixed in the future. (Already love this programming language BTW, really hoping I can start switching over to it at work when more support is added in). Info: |
I'm getting the same issue with: |
Same issue here on Ubuntu (not WSL). I tried debugging this issue in the official discord server here. As @keaneflynn observed, it doesn't allocate vram correctly, and hangs for a long time until it crashes with the following message:
I waited 45 minutes while another discord server member only waited 30 minutes with his example in a virtual machine. I don't think the time is as important since the program crashed shortly after I launched another application (steam in my case). A quick summary of the debugging we did in discord: Downgrading from cuda 12.5 to 12.4 doesn't help. Examples unrelated to bend compiled by
|
This is fascinating as I have a laptop with nearly identical specs that does manage to use the run-cu properly. I have the 5800H on ubuntu 22.04 except it has a 3080 mobile. I am pretty sure the cache on these two chips are identical per @nmay231 inquiry. Hoping to see some bug fixes here soon! |
Same issue here
|
Hey, I am also facing the same issue. All the dedicated GPU memory gets full. And the process is stuck.
|
@Imran-S-heikh I'm not certain if we have the same issue. My Video RAM for the bend/hvm process never went above 100 MiB. Perhaps we should all make sure we are experiencing the same thing. Here's a very simple program, that shouldn't need much memory. It hangs with def main:
return (1 + 1) Also, I forgot to mention I did try running
bend run-cu --verbose simple.bend
|
@nmay231 I get the exact same output. I also have the same results- it uses all my CPU but no GPU |
Can someone with the issue try running |
@TimotejFasiang For me, |
I have a little update on the issue, hope this will help to understand it better. Here are the version of relevant tools for both images
The only major different is I also noticed that
[email protected]~/www/bend-examples > time bend run bitonic_sort.bend
Result: 16646144
real 0m35.908s
user 0m33.658s
sys 0m2.250s
~/www/bend-examples > time bend run-c bitonic_sort.bend
Result: 16646144
real 0m10.611s
user 1m3.683s
sys 1m44.188s
~/www/bend-examples > time bend run-cu bitonic_sort.bend
Result: 16646144
real 0m2.410s
user 0m1.996s
sys 0m0.080s |
I am noticing similar. When running on WSL2, I noticed that the default parallel_hello_world would not finish (before I got bored of waiting and figured something was wrong). I have a standard RTX 3070, btw. I rewrote things to play around, and found that it ran plenty fast when running gen(13), just not gen(16). My guess is some issue with using too much GPU memory and getting stuck, as some have mentioned here. Moreover, when running gen(16), my GPU continued to be fully running after CTRL+C the command line and attempting to terminate the process. Is this related to having no IO? Info: |
@evbxll Sounds like a different issue. The issue I and others are describing happens even for a very simple program like the one below def main:
return (1 + 1) Also, the issue we're describing results in all our CPU being used, but none of our GPU, and CTRL+C does stop it from using all the CPU for me |
Eh, I feel like the original issue these comments are under is similar to me. Execution stuck when running CUDA WSL2 |
Yeah, one also get this issue, so one followed a few steps:
The fix is quite stupid, because it seems to take so long to move data from cpu to gpu that running the wabinab@...: $ bend run-cu simple.bend -s
Result: 2
- ITRS: 2
- LEAK: 0
- TIME: 5.83s
- MIPS: 0.00 Edit: Anyway, one tries to run a second time and it seems to decrease in time, although the simple isn't worth it. wabinab@...: $ bend run-cu simple.bend -s
Result: 2
- ITRS: 2
- LEAK: 0
- TIME: 0.29s
- MIPS: 0.00 Similarly, if we try run the bend run-c parallel_sum.bend -s
Result: 5908768
- ITRS: 45999971
- TIME: 0.69s
- MIPS: 66.89
bend run-cu parallel_sum.bend -s
Result: 5908768
- ITRS: 45983587
- LEAK: 37606783
- TIME: 0.83s
- MIPS: 55.62 There's a lot of LEAK, and calculations are slower compared to 4-core CPU (i5-7400). |
is there a fix for this? |
Reproducing the behavior
The issue
Hi,
I am having issues running code with
bend run-cu
on CUDA inside the WSL2. There are not errors, but code is not executing either. Execution is frozen, similary towhile (true) {}
.Code executes without any issues when using
bend run
orbend run-c
Compiling with
bend gen-cu
andnvcc
has the same result.I've tried both 12.4 and 12.5 and result is the same.
What I attempted
I tried different code examples from the repo, but result is always the same. Since bend allowed to compile code to cuda with
gen-cu
, I tried to find what is broken inside generated file (assumingbend run-cu
will use the same code). This issue happened insidegnet_normalize
function, where code could never exit thefor
loop. Thisbreak
is never callen (rlen
always has the same value)CUDA verification
Just to rule CUDA out, I have successfully installed CUDA and can confirm it is recognised by compiling and running this code.
cuda-test.cu
:output
nvidia-smi
Calling from wsl
System Settings
Example:
Additional context
No response
The text was updated successfully, but these errors were encountered: