-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD: compilation failure at comm_nccl.cu #959
Comments
You can ignore the warning for the The real problem is this:
|
@tylerjereddy could you please try replacing |
Will do, Venado is down for another day or two I think (this time for a dedicated activity time/reservation I think). |
@tylerjereddy Does the compiler provide any notes after the error? |
A few thousand lines of C++ spam follow the error IIRC (sorry C++ devs..), but I can share the full log once Venado comes back up if you want. |
@marcinz I was able to access a Venado frontend this morning, reproduce the problem, and then place the full compile error output in a repo at: https://github.com/tylerjereddy/error_messages/blob/main/compile_failures/legate_issue_959_oct_28_2024.txt |
* Add the compile error output after applying the legate patch from: nv-legate/legate#959 (comment)
@manopapad I applied your patch--the compilation still failed at roughly the same spot, but the reason for the failure did change. I've placed the full error output in a git repo at: https://github.com/tylerjereddy/error_messages/blob/main/compile_failures/legate_issue_959_oct_28_2024_after_patch.txt |
@tylerjereddy maybe this patch will make the compiler happy?
|
* Add compile failure after applying latest patch from: nv-legate/legate#959 (comment)
@manopapad compile failure after applying that patch is available here: https://github.com/tylerjereddy/error_messages/blob/main/compile_failures/legate_issue_959_oct_29_2024_after_patch.txt |
I tried switching to GNU compiler toolchain Likewise for using an older CUDA toolchain--same compiler errors with and without the latest patch when using Maybe I'll check if my old |
@tylerjereddy This could not make any difference, but I wonder if we could try to simplify the build. I am thinking:
The main idea is to modify as little as possible in the Cray env. The above worked for me on Perlmutter, but I started with some modules preloaded. Just in case, here is the module environment: module list
and here is the resulting mamba list
It would be nice to know if using the compiler wrappers with minimal changes to the environment makes any difference. Here is the resulting configuration: configuration
|
On the LANL Venado machine, Linux ARM/Grace-Hopper architecture, whether using clang 18 (
Cray clang version 18.0.0
) or gcc-13 (13.2.1
) compiler toolchain (both withnvcc
from CUDA 12.5), the same compilation error arises for a recently-providedlegate
release (we only received a tarball--and the only version info I can find isCMakeLists.txt:set(legate_version 24.09.00)
, but this may be a dev version of that and not a tagged release yet). If you direct me to the appropriate location to grep out an embeddedgit
hash I'll go ahead and do that for you, but I don't have agit bundle
, just a preview release tarball as far as I can tell.Here are the steps I follow on Venado:
Set up of environment and compilation commands
And here is the compilation failure (snipped at the end because the C++ compilation spam is after the error is a bit much):
The text was updated successfully, but these errors were encountered: