GitHub - georgantas/benchmark-matrix-multiply: Benchmarking different ways of performing matrix multiplication.

StandardMatrixMultiplier

Explanation

Standard triple for-loop using the CPU.

Performance

On Dell XPS 13: FLOPs: 2147483648; Execution time: 5.25 seconds; GFLOPS: 0.4090;

BlockMatrixMultiplier

Explanation

Using the CPU with blocking for more temporal and spatial locality. Leverages the L1 cache. More details here.

Performance

On Dell XPS 13: FLOPs: 2147483648; Execution time: 3.35 seconds; GFLOPS: 0.6402;

OpenCLMatrixMultiplier

Explanation

Using the GPU with a basic OpenCL kernel.

Performance

Platform: NVIDIA TITAN Xp / Device: NVIDIA CUDA: FLOPs: 2147483648; Execution time: 0.04 seconds; GFLOPS: 51.8588;
Platform: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz / Device: Intel(R) OpenCL: FLOPs: 2147483648; Execution time: 0.10 seconds; GFLOPS: 20.7495;

CublasMatrixMultiplier

Explanation

Use cuBLAS.

Performance

Platform: Tesla K80: FLOPs: 2147483648; Execution time: 0.01 seconds; GFLOPS: 304.7607;
Platform: RTX 3090: FLOPs: 2147483648; Execution time: 0.00 seconds; GFLOPS: 773.4415;

CudaBlockMatrixMultipler

Explanation

Load blocks into GPU shared memory to reduce global memory accesses. Explained in detail here.

Performance

Platform: RTX 3090: FLOPs: 2147483648; Execution time: 0.00 seconds; GFLOPS: 951.0809;

TODO

TransposedBlockMatrixMultiplier

Explanation

Similar to BlockMatrixMultiplier, but load the matrix "B" to memory transposed and use SIMD instructions to perform the block dot products.

Performance

TODO

NumpyMatrixMultiplier

Explanation

Multiply with a matrix in python using numpy for comparison.

Performance

TODO

GPGPUMatrixMultiplier

Explanation

Multiply with a GPU Shader.

Performance

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
include		include
resources		resources
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StandardMatrixMultiplier

Explanation

Performance

BlockMatrixMultiplier

Explanation

Performance

OpenCLMatrixMultiplier

Explanation

Performance

CublasMatrixMultiplier

Explanation

Performance

CudaBlockMatrixMultipler

Explanation

Performance

TODO

TransposedBlockMatrixMultiplier

Explanation

Performance

NumpyMatrixMultiplier

Explanation

Performance

GPGPUMatrixMultiplier

Explanation

Performance

About

Releases

Packages

Languages

georgantas/benchmark-matrix-multiply

Folders and files

Latest commit

History

Repository files navigation

StandardMatrixMultiplier

Explanation

Performance

BlockMatrixMultiplier

Explanation

Performance

OpenCLMatrixMultiplier

Explanation

Performance

CublasMatrixMultiplier

Explanation

Performance

CudaBlockMatrixMultipler

Explanation

Performance

TODO

TransposedBlockMatrixMultiplier

Explanation

Performance

NumpyMatrixMultiplier

Explanation

Performance

GPGPUMatrixMultiplier

Explanation

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages