-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
show expected and problematic output produced by deviceQuery in GPU docs #139
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -152,10 +152,32 @@ The only scenario where this would be required is if `$LD_LIBRARY_PATH` is modif | |||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
### Testing the GPU support {: #gpu_cuda_testing } | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
The quickest way to test if software installations included in EESSI can access and use your GPU is to run the | ||||||||||||||||||||||||||||||||||
The quickest way to test if software installations included in EESSI can access and use your GPU is to run the | ||||||||||||||||||||||||||||||||||
`deviceQuery` executable that is part of the `CUDA-Samples` module: | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
```{ .bash .copy } | ||||||||||||||||||||||||||||||||||
module load CUDA-Samples | ||||||||||||||||||||||||||||||||||
deviceQuery | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
If both are successful, you should see information about your GPU printed to your terminal. | ||||||||||||||||||||||||||||||||||
If both are successful, you should see information about your GPU printed to your terminal, for example: | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
$ deviceQuery | ||||||||||||||||||||||||||||||||||
deviceQuery Starting... | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
CUDA Device Query (Runtime API) version (CUDART static linking) | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Detected 1 CUDA Capable device(s) | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Device 0: "NVIDIA A2" | ||||||||||||||||||||||||||||||||||
CUDA Driver Version / Runtime Version 12.2 / 12.1 | ||||||||||||||||||||||||||||||||||
CUDA Capability Major/Minor version number: 8.6 | ||||||||||||||||||||||||||||||||||
... | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
If the `deviceQuery` command can not access your GPU, you will see an error message like: | ||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't actually happen though, because of the Lmod guards the only scenario I can see where you would reach this is where you are using a container and the system drivers are too old There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I triggered it by cleaning out the I agree it's very unlikely that it happens, but we should mention it in the docs regardless, if only to let people easily find this page when searching for error messages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My concern here is that the placement here makes it seem like it not working is likely, but reaching this message is actually very unlikely There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a little box saying |
||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
cudaGetDeviceCount returned 35 | ||||||||||||||||||||||||||||||||||
-> CUDA driver version is insufficient for CUDA runtime version | ||||||||||||||||||||||||||||||||||
Result = FAIL | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||||||||||
Comment on lines
+177
to
+183
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this only treats testing if you can run CUDA-enabled software from EESSI. Maybe we can also include a small instruction for testing if building new CUDA software on top of EESSI works properly. Something like this:
First, create a file
hello_cuda.cu
with the contentsThen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And mention they should test this for each version of CUDA they installed in
host_injections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but that should be done in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, sure. I won't block this one over it :) Although I would consider it to be an integral part of "Testing the GPU support" to be honest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it as so integral if we are focused on software consumers, it's only integral if you want to do development-type work