cuda

Implemented the max pool filter used in convolutional neural networks in two different ways.

Using the in built closed source cuDNN library provided by Nvidia.
From scratch using the shared memory.

The intention was to look at how the performance of the generic cnDNN library compares with a specific optimized GPU specific implementation. It turns out that building a filter using shared memeory and tailoring the solution for the requirements make the code run 2x faster!

How to the run the code. The code has the following dependencies. nvcc compiler for CUDA code. cuda/9.0.176 cudnn/9.0v7.0.5

For more information about CUDA and these libraries please refer to NVIDIA resources.

Once the requirements have been installed, load the modules into the current shell session module load cuda/9.0.176 module load cudnn/9.0v7.0.5

nvcc -o max_pool max_pool.cu -lcublas -lcudnn ./max_pool

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
max_pool.cu		max_pool.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuda

About

Releases

Packages

Languages

praveen-oak/max-pool-cuda

Folders and files

Latest commit

History

Repository files navigation

cuda

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages