Why is GPU implementation significantly slower than CPU? #2421

jinshanmu · 2024-07-23T00:53:40Z

jinshanmu
Jul 23, 2024

I was trying the GPU example script:

from devito import *
import matplotlib.pyplot as plt

nx, ny = 100, 100
grid = Grid(shape=(nx, ny))

u = TimeFunction(name='u', grid=grid, space_order=2, save=200)
c = Constant(name='c')

eqn = Eq(u.dt, c * u.laplace)

step = Eq(u.forward, solve(eqn, u.forward))

xx, yy = np.meshgrid(np.linspace(0., 1., nx, dtype=np.float32),
                     np.linspace(0., 1., ny, dtype=np.float32))
r = (xx - .5) ** 2. + (yy - .5) ** 2.
u.data[0, np.logical_and(.05 <= r, r <= .1)] = 1.

op = Operator([step])

stats = op.apply(dt=5e-05, c=.5)

plt.rcParams['figure.figsize'] = (20, 20)
for i in range(1, 6):
    plt.subplot(1, 6, i)
    plt.imshow(u.data[(i - 1) * 40])
plt.show()

The CPU version op = Operator([step]) returned

Operator Kernel ran in 0.01 s

However, the GPU version op = Operator([step], platform='nvidiaX', opt=('advanced', {'gpu-fit': u})) returned

Operator Kernel ran in 4.74 s

My CPU is Intel Xeon Gold 6133 * 80. My GPU is NVIDIA GeForce RTX 4080, with cuda 11.8 and NVIDIA HPC SDK 22.11, which works normally for other programs (e.g. PyTorch).

Any idea on what is going on here?

Thank you in advance!

Answered by jinshanmu

Jul 24, 2024

The compilation errors resolved after I downgraded to Ubuntu 22.04 from 24.04. The highest Ubuntu version that NVIDIA HPC SDK 22.11 supports is 22.04.

View full answer

mloubout · 2024-07-23T00:58:31Z

mloubout
Jul 23, 2024
Maintainer

This is a small 2D problem so I don't expect the GPU backend to shine here, but this is quite a higher difference than expected *see below for what I get on my machine)

Could you run it with DEVITO_LOGGING=DEBUG ? Also what configuration are you using for GPU? It's recommended to use openacc.

Allocating host memory for u(200, 104, 104) [8 MB]
Operator `Kernel` generated in 0.31 s
  * lowering.Clusters: 0.14 s (46.3 %)
     * specializing.Clusters: 0.08 s (26.5 %)
  * lowering.IET: 0.11 s (36.4 %)
     * specializing.IET: 0.08 s (26.5 %)
Flops reduction after symbolic optimization: [24 --> 21]
gcc-10 -march=native -O3 -g -fPIC -Wall -std=c99 -Wno-unused-result -Wno-unused-variable -Wno-unused-but-set-variable -ffast-math -mprefer-vector-width=512 -shared -fopenmp /tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.c -lm -o /tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.so
Operator `Kernel` jit-compiled `/tmp/devito-jitcache-uid1001/bd3e867f29ab08e72e3d3169364a882ba0e51811.c` in 0.33 s with `GNUCompiler`
Operator `Kernel` ran in 0.06 s
Global performance: [OI=1.92, 0.54 GFlops/s, 0.04 GPts/s]
Global performance <w/o setup>: [0.06 s, 0.04 GPts/s]
Local performance:
  * section0 ran in 0.06 s [OI=1.92, 0.62 GFlops/s, 0.04 GPts/s]
Performance[mode=advanced] arguments: {'nthreads': 8, 'pthreads': 0}

Allocating host memory for u(200, 104, 104) [8 MB]
Operator `Kernel` generated in 0.40 s
  * lowering.IET: 0.18 s (45.9 %)
     * specializing.IET: 0.13 s (33.2 %)
  * lowering.Clusters: 0.16 s (40.8 %)
     * specializing.Clusters: 0.10 s (25.5 %)
Flops reduction after symbolic optimization: [24 --> 21]
nvc++ -g -fPIC -std=c++11 -gpu=pinned -mp -acc:gpu -fast -shared /tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp -lm -o /tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.so
Operator `Kernel` jit-compiled `/tmp/devito-jitcache-uid1001/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp` in 1.85 s with `NvidiaCompiler`
Operator `Kernel` ran in 0.20 s
Global performance: [OI=1.92, 0.16 GFlops/s, 0.01 GPts/s]
Global performance <w/o setup>: [0.01 s, 0.88 GPts/s]
Local performance:
  * section0 ran in 0.01 s
Performance[mode=advanced] arguments: {'deviceid': -1, 'devicerm': 1, 'pthreads': 0}

3 replies

jinshanmu Jul 23, 2024
Author

After export DEVITO_LOGGING=DEBUG, the results are:

CPU

Allocating host memory for u(200, 104, 104) [8 MB]
Operator Kernel generated in 0.31 s

lowering.Clusters: 0.14 s (46.7 %)

specializing.Clusters: 0.08 s (26.7 %)

lowering.IET: 0.12 s (40.0 %)

specializing.IET: 0.08 s (26.7 %)
Flops reduction after symbolic optimization: [24 --> 21]
Operator Kernel fetched /tmp/devito-jitcache-uid1000/51e941b5e7ab3894383af0a9a734ec0095719df1.c in 0.03 s from jit-cache
Operator Kernel ran in 0.01 s
Global performance: [OI=1.92, 3.19 GFlops/s, 0.20 GPts/s]
Global performance <w/o setup>: [0.01 s, 0.93 GPts/s]
Local performance:

section0 ran in 0.01 s
Performance[mode=advanced] arguments: {'pthreads': 0}

GPU

Allocating host memory for u(200, 104, 104) [8 MB]
Operator Kernel generated in 0.32 s

lowering.IET: 0.17 s (54.7 %)

specializing.IET: 0.11 s (35.4 %)

lowering.Clusters: 0.10 s (32.2 %)
Flops reduction after symbolic optimization: [24 --> 21]
Operator Kernel fetched /tmp/devito-jitcache-uid1000/2e4fe8ef39cba334059dca71bed11e2bbf18b934.c in 0.02 s from jit-cache
Operator Kernel ran in 6.30 s
Global performance: [OI=1.92, 0.01 GFlops/s, 0.01 GPts/s]
Global performance <w/o setup>: [6.30 s, 0.01 GPts/s]
Local performance:

section0 ran in 6.30 s [OI=1.92, 0.01 GFlops/s, 0.01 GPts/s]
Performance[mode=advanced] arguments: {'deviceid': -1, 'devicerm': 1, 'pthreads': 0}

Regarding openacc, I got numerous errors after I tried to set the environmental variables.

export DEVITO_LANGUAGE=openacc
export DEVITO_PLATFORM=nvidiaX
export DEVITO_ARCH=nvc

Allocating host memory for u(200, 104, 104) [8 MB]
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
TimeFunction [u(time, x, y)] assumed to fit the GPU memory
Operator Kernel generated in 0.26 s

lowering.IET: 0.12 s (46.4 %)

specializing.IET: 0.08 s (30.9 %)

lowering.Clusters: 0.10 s (38.6 %)

specializing.Clusters: 0.06 s (23.2 %)
Flops reduction after symbolic optimization: [24 --> 21]
nvc++ -g -fPIC -std=c++11 -gpu=pinned -mp -acc:gpu -fast -shared /tmp/devito-jitcache-uid1000/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp -lm -o /tmp/devito-jitcache-uid1000/e8fc0bff922d151c2636b5617581110b21c4c2ab.so
"/usr/include/stdlib.h", line 141: error: identifier "_Float32" is undefined
extern _Float32 strtof32 (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 147: error: identifier "_Float64" is undefined
extern _Float64 strtof64 (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 153: error: identifier "_Float128" is undefined
extern _Float128 strtof128 (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 159: error: identifier "_Float32x" is undefined
extern _Float32x strtof32x (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 165: error: identifier "_Float64x" is undefined
extern _Float64x strtof64x (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 299: error: identifier "_Float32" is undefined
_Float32 __f)
^

"/usr/include/stdlib.h", line 305: error: identifier "_Float64" is undefined
_Float64 __f)
^

"/usr/include/stdlib.h", line 311: error: identifier "_Float128" is undefined
_Float128 __f)
^

"/usr/include/stdlib.h", line 317: error: identifier "_Float32x" is undefined
_Float32x __f)
^

"/usr/include/stdlib.h", line 323: error: identifier "_Float64x" is undefined
_Float64x __f)
^

"/usr/include/stdlib.h", line 436: error: identifier "_Float32" is undefined
extern _Float32 strtof32_l (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 443: error: identifier "_Float64" is undefined
extern _Float64 strtof64_l (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 450: error: identifier "_Float128" is undefined
extern _Float128 strtof128_l (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 457: error: identifier "_Float32x" is undefined
extern _Float32x strtof32x_l (const char *__restrict __nptr,
^

"/usr/include/stdlib.h", line 464: error: identifier "_Float64x" is undefined
extern _Float64x strtof64x_l (const char *__restrict __nptr,
^

"/usr/include/c++/13/bits/std_abs.h", line 142: error: identifier "__builtin_fabsf128" is undefined
return __builtin_fabsf128(__x);
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 53: error: identifier "_Float32" is undefined
__MATHCALL_VEC (acos,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 55: error: identifier "_Float32" is undefined
__MATHCALL_VEC (asin,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 57: error: identifier "_Float32" is undefined
__MATHCALL_VEC (atan,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 59: error: identifier "_Float32" is undefined
__MATHCALL_VEC (atan2,, (Mdouble __y, Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 62: error: identifier "_Float32" is undefined
__MATHCALL_VEC (cos,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 64: error: identifier "_Float32" is undefined
__MATHCALL_VEC (sin,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 66: error: identifier "_Float32" is undefined
__MATHCALL_VEC (tan,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 71: error: identifier "_Float32" is undefined
__MATHCALL_VEC (cosh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 73: error: identifier "_Float32" is undefined
__MATHCALL_VEC (sinh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 75: error: identifier "_Float32" is undefined
__MATHCALL_VEC (tanh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 79: error: identifier "_Float32" is undefined
__MATHDECL_VEC (void,sincos,,
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 85: error: identifier "_Float32" is undefined
__MATHCALL_VEC (acosh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 87: error: identifier "_Float32" is undefined
__MATHCALL_VEC (asinh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 89: error: identifier "_Float32" is undefined
__MATHCALL_VEC (atanh,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 95: error: identifier "_Float32" is undefined
__MATHCALL_VEC (exp,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 98: error: identifier "_Float32" is undefined
__MATHCALL (frexp,, (Mdouble __x, int *__exponent));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 101: error: identifier "_Float32" is undefined
__MATHCALL (ldexp,, (Mdouble __x, int __exponent));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 104: error: identifier "_Float32" is undefined
__MATHCALL_VEC (log,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 107: error: identifier "_Float32" is undefined
__MATHCALL_VEC (log10,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 110: error: identifier "_Float32" is undefined
__MATHCALL (modf,, (Mdouble __x, Mdouble *__iptr)) __nonnull ((2));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 114: error: identifier "_Float32" is undefined
__MATHCALL_VEC (exp10,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 119: error: identifier "_Float32" is undefined
__MATHCALL_VEC (expm1,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 122: error: identifier "_Float32" is undefined
__MATHCALL_VEC (log1p,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 125: error: identifier "_Float32" is undefined
__MATHCALL (logb,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 130: error: identifier "_Float32" is undefined
__MATHCALL_VEC (exp2,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 133: error: identifier "_Float32" is undefined
__MATHCALL_VEC (log2,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 140: error: identifier "_Float32" is undefined
__MATHCALL_VEC (pow,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 143: error: identifier "_Float32" is undefined
__MATHCALL (sqrt,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 147: error: identifier "_Float32" is undefined
__MATHCALL_VEC (hypot,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 152: error: identifier "_Float32" is undefined
__MATHCALL_VEC (cbrt,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 159: error: identifier "_Float32" is undefined
__MATHCALLX (ceil,, (Mdouble __x), (const));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 162: error: identifier "_Float32" is undefined
__MATHCALLX (fabs,, (Mdouble __x), (const));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 165: error: identifier "_Float32" is undefined
__MATHCALLX (floor,, (Mdouble __x), (const));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 168: error: identifier "_Float32" is undefined
__MATHCALL (fmod,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 198: error: identifier "_Float32" is undefined
__MATHCALLX (copysign,, (Mdouble __x, Mdouble __y), (const));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 203: error: identifier "_Float32" is undefined
__MATHCALL (nan,, (const char *__tagb));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 220: error: identifier "_Float32" is undefined
__MATHCALL (j0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 220: error: expected a ";"
__MATHCALL (j0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 220: error: "_Float32" is not a type name
__MATHCALL (j0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 220: error: expected a ";"
__MATHCALL (j0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 221: error: "_Float32" is not a type name
__MATHCALL (j1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 221: error: expected a ";"
__MATHCALL (j1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 221: error: expected a ";"
__MATHCALL (j1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 222: error: "_Float32" is not a type name
__MATHCALL (jn,, (int, Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 223: error: "_Float32" is not a type name
__MATHCALL (y0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 223: error: expected a ";"
__MATHCALL (y0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 223: error: expected a ";"
__MATHCALL (y0,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 224: error: "_Float32" is not a type name
__MATHCALL (y1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 224: error: expected a ";"
__MATHCALL (y1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 224: error: expected a ";"
__MATHCALL (y1,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 225: error: "_Float32" is not a type name
__MATHCALL (yn,, (int, Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 231: error: "_Float32" is not a type name
__MATHCALL_VEC (erf,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 231: error: expected a ";"
__MATHCALL_VEC (erf,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 231: error: expected a ";"
__MATHCALL_VEC (erf,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 232: error: "_Float32" is not a type name
__MATHCALL_VEC (erfc,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 232: error: expected a ";"
__MATHCALL_VEC (erfc,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 232: error: expected a ";"
__MATHCALL_VEC (erfc,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 233: error: "_Float32" is not a type name
__MATHCALL (lgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 233: error: expected a ";"
__MATHCALL (lgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 233: error: expected a ";"
__MATHCALL (lgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 238: error: "_Float32" is not a type name
__MATHCALL (tgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 238: error: expected a ";"
__MATHCALL (tgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 238: error: expected a ";"
__MATHCALL (tgamma,, (Mdouble));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: "_Float32" is not a type name
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: type name is not allowed
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: identifier "__signgamp" is undefined
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: expected a ";"
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: type name is not allowed
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 252: error: expected a ";"
__MATHCALL (lgamma,_r, (Mdouble, int *__signgamp));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 259: error: "_Float32" is not a type name
__MATHCALL (rint,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 259: error: expected a ")"
__MATHCALL (rint,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 259: error: expected a ";"
__MATHCALL (rint,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 259: error: expected a ")"
__MATHCALL (rint,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 259: error: expected a ";"
__MATHCALL (rint,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 262: error: "_Float32" is not a type name
__MATHCALL (nextafter,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 262: error: expected a ")"
__MATHCALL (nextafter,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 262: error: expected a ";"
__MATHCALL (nextafter,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 262: error: expected a ")"
__MATHCALL (nextafter,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 262: error: expected a ";"
__MATHCALL (nextafter,, (Mdouble __x, Mdouble __y));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 269: error: "_Float32" is not a type name
__MATHCALL (nextdown,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 269: error: expected a ")"
__MATHCALL (nextdown,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 269: error: expected a ";"
__MATHCALL (nextdown,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 269: error: expected a ")"
__MATHCALL (nextdown,, (Mdouble __x));
^

"/usr/include/x86_64-linux-gnu/bits/mathcalls.h", line 269: error: expected a ";"
__MATHCALL (nextdown,, (Mdouble __x));
^

Error limit reached.
100 errors detected in the compilation of "/tmp/devito-jitcache-uid1000/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp".
Compilation terminated.
FAILED compiler invocation: nvc++ -g -fPIC -std=c++11 -gpu=pinned -mp -acc:gpu -fast -shared /tmp/devito-jitcache-uid1000/e8fc0bff922d151c2636b5617581110b21c4c2ab.cpp -lm -o /tmp/devito-jitcache-uid1000/e8fc0bff922d151c2636b5617581110b21c4c2ab.so
Traceback (most recent call last):
File "/home/server/stride/test0.py", line 26, in
stats = op.apply(dt=5e-05, c=.5)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/devito/operator/operator.py", line 868, in apply
cfunction = self.cfunction
^^^^^^^^^^^^^^
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/devito/operator/operator.py", line 750, in cfunction
self._jit_compile()
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/devito/operator/operator.py", line 736, in _jit_compile
recompiled, src_file = self._compiler.jit_compile(self._soname, str(self))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/devito/arch/compiler.py", line 376, in jit_compile
_, _, _, recompiled = compile_from_string(self, target, code, src_file,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/codepy/jit.py", line 439, in compile_from_string
toolchain.build_extension(ext_file, source_paths, debug=debug)
File "/home/server/anaconda3/envs/stride/lib/python3.11/site-packages/codepy/toolchain.py", line 211, in build_extension
raise CompileError("module compilation failed")
codepy.CompileError: module compilation failed

Setting environmental variables for CPU has no problem.

export DEVITO_LANGUAGE=openmp
export DEVITO_PLATFORM=skx
export DEVITO_ARCH=gcc

Allocating host memory for u(200, 104, 104) [8 MB]
Operator Kernel generated in 0.33 s

lowering.Clusters: 0.14 s (43.2 %)

specializing.Clusters: 0.08 s (24.7 %)

lowering.IET: 0.14 s (43.2 %)

specializing.IET: 0.10 s (30.9 %)
Flops reduction after symbolic optimization: [24 --> 21]
gcc -march=native -O3 -g -fPIC -Wall -std=c99 -Wno-unused-result -Wno-unused-variable -Wno-unused-but-set-variable -ffast-math -mprefer-vector-width=512 -shared -fopenmp /tmp/devito-jitcache-uid1000/06d2770144d3c30a7af468ca6a55b37de402994e.c -lm -o /tmp/devito-jitcache-uid1000/06d2770144d3c30a7af468ca6a55b37de402994e.so
Operator Kernel jit-compiled /tmp/devito-jitcache-uid1000/06d2770144d3c30a7af468ca6a55b37de402994e.c in 0.28 s with GNUCompiler
Operator Kernel ran in 0.02 s
Global performance: [OI=1.92, 1.60 GFlops/s, 0.10 GPts/s]
Global performance <w/o setup>: [0.02 s, 0.15 GPts/s]
Local performance:

section0 ran in 0.02 s [OI=1.92, 2.30 GFlops/s, 0.15 GPts/s]
Performance[mode=advanced] arguments: {'nthreads': 40, 'pthreads': 0}

mloubout Jul 23, 2024
Maintainer

Sorry so what configuration are you using for GPU openmp+nvidiaX ?

jinshanmu Jul 23, 2024
Author

I replicated the ~5s slow time after specifying openmp+nvidiaX+gcc (also the automatic configuration I guess). When openacc+nvidiaX+nvc is specified there are the compilation errors shown above.

jinshanmu · 2024-07-24T16:52:35Z

jinshanmu
Jul 24, 2024
Author

The compilation errors resolved after I downgraded to Ubuntu 22.04 from 24.04. The highest Ubuntu version that NVIDIA HPC SDK 22.11 supports is 22.04.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devito Codes

Why is GPU implementation significantly slower than CPU? #2421

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Devito Codes

Why is GPU implementation significantly slower than CPU? #2421

jinshanmu Jul 23, 2024

Replies: 2 comments · 3 replies

mloubout Jul 23, 2024 Maintainer

jinshanmu Jul 23, 2024 Author

mloubout Jul 23, 2024 Maintainer

jinshanmu Jul 23, 2024 Author

jinshanmu Jul 24, 2024 Author

jinshanmu
Jul 23, 2024

Replies: 2 comments 3 replies

mloubout
Jul 23, 2024
Maintainer

jinshanmu Jul 23, 2024
Author

mloubout Jul 23, 2024
Maintainer

jinshanmu Jul 23, 2024
Author

jinshanmu
Jul 24, 2024
Author