Skip to content

Implemented the max pool filter in CUDA using the in built library and using shared memory

Notifications You must be signed in to change notification settings

praveen-oak/max-pool-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

cuda

Implemented the max pool filter used in convolutional neural networks in two different ways.

  1. Using the in built closed source cuDNN library provided by Nvidia.
  2. From scratch using the shared memory.

The intention was to look at how the performance of the generic cnDNN library compares with a specific optimized GPU specific implementation. It turns out that building a filter using shared memeory and tailoring the solution for the requirements make the code run 2x faster!

How to the run the code. The code has the following dependencies. nvcc compiler for CUDA code. cuda/9.0.176 cudnn/9.0v7.0.5

For more information about CUDA and these libraries please refer to NVIDIA resources.

Once the requirements have been installed, load the modules into the current shell session module load cuda/9.0.176 module load cudnn/9.0v7.0.5

nvcc -o max_pool max_pool.cu -lcublas -lcudnn ./max_pool

About

Implemented the max pool filter in CUDA using the in built library and using shared memory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages