NVBLAS #23

maleadt · 2021-02-16T17:34:35Z

Might be interesting to experiment with NVBLAS: https://docs.nvidia.com/cuda/nvblas/index.html

The NVBLAS Library is a GPU-accelerated Libary that implements BLAS (Basic Linear Algebra Subprograms). It can accelerate most BLAS Level-3 routines by dynamically routing BLAS calls to one or more NVIDIA GPUs present in the system, when the charateristics of the call make it to speedup on a GPU.

Part of CUDA_jll: https://github.com/JuliaBinaryWrappers/CUDA_jll.jl/blob/44445f650547dd14db177336e488460e56d4f354/src/wrappers/x86_64-linux-gnu.jl#L164-L168

ViralBShah · 2021-02-16T17:47:05Z

@staticfloat I suppose this is going to be the same as MKL. Forwarding 64_ suffixed BLAS functions to the non-suffixed ones.

@maleadt Are any init/threading NVBLAS specific APIs that need calling. Those will be needed to added here like we did for MKL in #19

maleadt · 2021-02-17T12:23:18Z

No specific APIs to call. One problem is that this BLAS only supports a limited number of functions, and forwards to another blas itself (configurable via environment variables and a configuration file):

000000000000bfb0 g    DF .text  0000000000000282  libnvblas.so.11 chemm_
000000000000ca10 g    DF .text  0000000000000282  libnvblas.so.11 csyr2k_
0000000000009670 g    DF .text  00000000000002bd  libnvblas.so.11 cgemm_
00000000000090f0 g    DF .text  00000000000002bd  libnvblas.so.11 sgemm_
000000000000cf70 g    DF .text  0000000000000282  libnvblas.so.11 cher2k_
000000000000afb0 g    DF .text  000000000000029c  libnvblas.so.11 ctrsm_
000000000000aa70 g    DF .text  000000000000029c  libnvblas.so.11 strsm_
000000000000a320 g    DF .text  0000000000000250  libnvblas.so.11 zsyrk_
0000000000009e80 g    DF .text  0000000000000250  libnvblas.so.11 dsyrk_
000000000000c240 g    DF .text  0000000000000282  libnvblas.so.11 zhemm_
0000000000009930 g    DF .text  00000000000002bd  libnvblas.so.11 zgemm_
000000000000c4f0 g    DF .text  0000000000000282  libnvblas.so.11 ssyr2k_
000000000000a5b0 g    DF .text  0000000000000250  libnvblas.so.11 cherk_
00000000000093b0 g    DF .text  00000000000002bd  libnvblas.so.11 dgemm_
000000000000b250 g    DF .text  000000000000029c  libnvblas.so.11 ztrsm_
000000000000ad10 g    DF .text  000000000000029c  libnvblas.so.11 dtrsm_
000000000000b530 g    DF .text  0000000000000282  libnvblas.so.11 ssymm_
000000000000ba50 g    DF .text  0000000000000282  libnvblas.so.11 csymm_
000000000000da10 g    DF .text  00000000000002ac  libnvblas.so.11 ctrmm_
000000000000d4b0 g    DF .text  00000000000002ac  libnvblas.so.11 strmm_
000000000000c780 g    DF .text  0000000000000282  libnvblas.so.11 dsyr2k_
000000000000a800 g    DF .text  0000000000000250  libnvblas.so.11 zherk_
000000000000bce0 g    DF .text  0000000000000282  libnvblas.so.11 zsymm_
000000000000dcc0 g    DF .text  00000000000002ac  libnvblas.so.11 ztrmm_
000000000000b7c0 g    DF .text  0000000000000282  libnvblas.so.11 dsymm_
000000000000d760 g    DF .text  00000000000002ac  libnvblas.so.11 dtrmm_
000000000000a0d0 g    DF .text  0000000000000250  libnvblas.so.11 csyrk_
000000000000cca0 g    DF .text  0000000000000282  libnvblas.so.11 zsyr2k_
0000000000009c30 g    DF .text  0000000000000250  libnvblas.so.11 ssyrk_
000000000000d200 g    DF .text  0000000000000282  libnvblas.so.11 zher2k_

This breaks autodetection. Adding some symbol to the list works for suffix detection, but for interface detection that doesn't scale.

[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided

ViralBShah · 2021-02-17T14:00:52Z

We can make nvblas.conf or the env variable point to the Julia provided openblas.

ViralBShah · 2021-03-04T03:22:04Z

@maleadt - Ideally something like this is what we need to try out NVBLAS: https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl#L38

Of course, we'll then find things that don't quite work and perhaps LBT may need to be taught about NVBLAS. I suppose CUDA_jll does not include LAPACK.

maleadt · 2021-03-04T06:51:51Z

I suppose CUDA_jll does not include LAPACK.

Not a drop-in version like NVBLAS at least.

This was referenced Feb 17, 2021

Generic APIs #22

Closed

Feature request: use libblastrampoline (LBT) to select Octavian as the BLAS JuliaLinearAlgebra/Octavian.jl#68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVBLAS #23

NVBLAS #23

maleadt commented Feb 16, 2021

ViralBShah commented Feb 16, 2021

maleadt commented Feb 17, 2021

ViralBShah commented Feb 17, 2021

ViralBShah commented Mar 4, 2021

maleadt commented Mar 4, 2021

NVBLAS #23

NVBLAS #23

Comments

maleadt commented Feb 16, 2021

ViralBShah commented Feb 16, 2021

maleadt commented Feb 17, 2021

ViralBShah commented Feb 17, 2021

ViralBShah commented Mar 4, 2021

maleadt commented Mar 4, 2021