-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTX1080 CUDA issues #8
Comments
happens with cuda 7.5 on a K80 as well |
I think that the maximum for block sizes is 65535, not 65536. Might be the issue? |
are you sure?
to be more precise:
|
You're right! I think I did so much programming on compute 2.x that I forgot they ever changed it... |
I took the |
I got it ... the problem is that you do not generate PTX/SASS for the architecture at hand but use the default nvcc options. If I inject architecture specific options in the nvcc compilation step, gpu-stream-cuda runs through alright!
FWIW, this issue can be closed. |
Thanks for highlighting this; useful to know about this behaviour. Closing as not an issue with the code itself. |
are you guys planning to include respective nvcc flags in the cmake file or document this on the wiki/landing page? |
This is probably a change we won't add into the CMake file because it could tune it for specific hardware too; and we want the "vanilla" code to be as neutral as possible. We are looking to add some tuned versions of some of the models into the repo somehow, and we would put this change in with a tuned CUDA version. I'll reopen the issue and mark as won't fix so the bug doesn't get forgotten. This is the same as we did with #1. |
Tom, I am not sure I understand what you mean. I thought that the
motivation behind a benchmark is to see what the hardware is really
capable of ... not what timings an unoptimized binary is giving you. I
am irritated.
|
The motivation behind this code is to explore what performance there is on a variety of architectures across a variety of programming models for simple STREAM, and were focussing on 'out of the box' performance. We modelled it on STREAM itself which doesn't do any special tuning. The original STREAM benchmark provides the ability for tuned versions to be added, which is something we are planning on doing. Our results show that on the GPUs there isn't any tuning required to get close to theoretical peak performance. It is surprising that |
Build system has been revised in v.3.1. You can now pass in the architecture flag easily:
|
I wanted to benchmark a GTX 1080 with cuda 8.0.27 under CentOS 7.2.1511. the gpu-stream-cuda app behaves normal with the default parameters.
Strange enough though, when I want to provide more than the default number of elements in the array:
the copy kernel dispatch throws a CUDA API error
0xb
which isInvalid Argument
. I tracked down the problem to (this line of code)[https://github.com/UoB-HPC/GPU-STREAM/blob/master/CUDAStream.cu#L112]:strange enough, if I look at the values of array_size/TBSIZE, they are in plausible ranges
arraysize/TBSIZE = 65536
.Does anyone have an idea where this is coming from? (as this is a RC cuda, I see no problem forwarding this issue to nvidia)
The text was updated successfully, but these errors were encountered: