-
I was trying the GPU example script:
The CPU version
However, the GPU version
My CPU is Intel Xeon Gold 6133 * 80. My GPU is NVIDIA GeForce RTX 4080, with cuda 11.8 and NVIDIA HPC SDK 22.11, which works normally for other programs (e.g. PyTorch). Any idea on what is going on here? Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
This is a small 2D problem so I don't expect the GPU backend to shine here, but this is quite a higher difference than expected *see below for what I get on my machine) Could you run it with Allocating host memory for u(200, 104, 104) [8 MB]
Operator `Kernel` generated in 0.31 s
* lowering.Clusters: 0.14 s (46.3 %)
* specializing.Clusters: 0.08 s (26.5 %)
* lowering.IET: 0.11 s (36.4 %)
* specializing.IET: 0.08 s (26.5 %)
Flops reduction after symbolic optimization: [24 --> 21]
gcc-10 -march=native -O3 -g -fPIC -Wall -std=c99 -Wno-unused-result -Wno-unused-variable -Wno-unused-but-set-variable -ffast-math -mprefer-vector-width=512 -shared -fopenmp /tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.c -lm -o /tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.so
Operator `Kernel` jit-compiled `/tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.c` in 0.33 s with `GNUCompiler`
Operator `Kernel` ran in 0.06 s
Global performance: [OI=1.92, 0.54 GFlops/s, 0.04 GPts/s]
Global performance <w/o setup>: [0.06 s, 0.04 GPts/s]
Local performance:
* section0 ran in 0.06 s [OI=1.92, 0.62 GFlops/s, 0.04 GPts/s]
Performance[mode=advanced] arguments: {'nthreads': 8, 'pthreads': 0} Allocating host memory for u(200, 104, 104) [8 MB]
Operator `Kernel` generated in 0.40 s
* lowering.IET: 0.18 s (45.9 %)
* specializing.IET: 0.13 s (33.2 %)
* lowering.Clusters: 0.16 s (40.8 %)
* specializing.Clusters: 0.10 s (25.5 %)
Flops reduction after symbolic optimization: [24 --> 21]
nvc++ -g -fPIC -std=c++11 -gpu=pinned -mp -acc:gpu -fast -shared /tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp -lm -o /tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.so
Operator `Kernel` jit-compiled `/tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp` in 1.85 s with `NvidiaCompiler`
Operator `Kernel` ran in 0.20 s
Global performance: [OI=1.92, 0.16 GFlops/s, 0.01 GPts/s]
Global performance <w/o setup>: [0.01 s, 0.88 GPts/s]
Local performance:
* section0 ran in 0.01 s
Performance[mode=advanced] arguments: {'deviceid': -1, 'devicerm': 1, 'pthreads': 0} |
Beta Was this translation helpful? Give feedback.
-
The compilation errors resolved after I downgraded to Ubuntu 22.04 from 24.04. The highest Ubuntu version that NVIDIA HPC SDK 22.11 supports is 22.04. |
Beta Was this translation helpful? Give feedback.
The compilation errors resolved after I downgraded to Ubuntu 22.04 from 24.04. The highest Ubuntu version that NVIDIA HPC SDK 22.11 supports is 22.04.