-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance archdetect to support detection of NVIDIA GPUs + using that in EESSI init script #767
enhance archdetect to support detection of NVIDIA GPUs + using that in EESSI init script #767
Conversation
…n EESSI init script
Instance
|
Instance
|
Instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can give people the override capabilities to allow mix and match CPU/GPU stacks (in terms of the CPU architectures)
…GPU_SOFTWARE_SUBDIR_OVERRIDE Co-authored-by: ocaisa <[email protected]>
…ailed to run + take that into account in EESSI init script + allow overriding software subdirectory for accel/* via $EESSI_ACCEL_SOFTWARE_SUBDIR_OVERRIDE
…t, must be 'accel/nvidia/cc[0-9][0-9]'
…with 'No devices were found' if no GPUs are available in Slurm job
@ocaisa Don't merge this just yet (although it's ready for re-review + testing). We should go all the way here, and also set up some CI for this, by using fake |
Tested extensively: CPU-only system (zen2), no
|
@ocaisa I've added an extensive GitHub Actions workflow for verifying the NVIDIA GPU accelerator detection implemented in this PR, see 24f0620. There's one issue though: implementing these tests revealed that the EESSI init script now "chokes" when both:
For me, this is for enough reason to re-consider your (currently implemented) suggestion to let the Thoughts? |
I would temporarily disable |
Or give archdetect an environment variable or option that allows overriding returning an error code |
…ct to detect accelerator
@ocaisa Anything blocking this now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Staging PR merged, good to go! |
PR merged! Moved |
PR merged! Moved |
Example output: