-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #776 #777
Fixed #776 #777
Conversation
hfp
commented
Apr 4, 2024
- Citation: "Setting cudaLimitPrintfFifoSize must not be performed after launching any kernel that uses the printf() device system call - in such case cudaErrorInvalidValue will be returned."
- Since DeviceSetLimit is governed by ACC_API_CALL, the symbol NDEBUG must not be defined for reproducing the issue.
Issue #776 was discovered when testing with enabled assertions, i.e., DBCSR's CUDA tests may have assertions removed. Perhaps it is valuable to test with enabled assertions. |
@hfp for my understanding:
In any case, your change makes sense to me. I think the entire assumption was that the first call to the ACC part was c_dbcsr_acc_set_active_device, assuming we call it only once, which is clearly not that case... I think we can move the call to a more convenient place... |
(BTW, trying to recover the Daint-CI output...) |
CSCS CI seems broken on their side:
but we have budget... Please ignore it for the moment. |
The cp2k regression tests on Piz Daint are also disable, because the project
@juerghutter could you have a look? |
Project g90 is open again (until 2025-03-31).
…________________________________________
From: Matthias Krack ***@***.***>
Sent: Friday, April 5, 2024 10:13 AM
To: cp2k/dbcsr
Cc: Jürg Hutter; Mention
Subject: Re: [cp2k/dbcsr] Fixed #776 (PR #777)
The cp2k regression tests on Piz Daint<https://dashboard.cp2k.org/index.html> are also disable, because the project g90 has expired. sbatch returns
project "g90" expired on 2024-03-31
@juerghutter<https://github.com/juerghutter> could you have a look?
—
Reply to this email directly, view it on GitHub<#777 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2WEURGQJDACA44SJJ3L7TY3ZMMNAVCNFSM6AAAAABFXKHJWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZZGIYTSOJUGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Ok, I just assumed this because the issue came up when I removed
No, we don't. Perhaps someone did so during development and wanted to keep this setting.
ACK; see above (me neither ;-).
OK, this is good to go in principle. However, I will move the call into the init function. |
What do you suggest? Putting it into acc_init may not be the right thing as it is device specific. I wonder if the code in question should be removed entirely? |
* Citation: "Setting cudaLimitPrintfFifoSize must not be performed after launching any kernel that uses the printf() device system call - in such case cudaErrorInvalidValue will be returned." * Since DeviceSetLimit is governed by ACC_API_CALL, the symbol NDEBUG must not be defined for reproducing the issue.
I rebased the PR and if it's green (let's hope for Daint-CI), I will merge it. Removing (or moving) the code in question might be another PR. |
ACK. |