-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bot-specific SitePackage.lua
that solves libfabric
issues
#531
Comments
|
Same approach could be used for other problems that are triggered via |
@TopRichard also found an issue with our CUDA hook when trying to use it on NESSI, it will currently forbid the loading of dependency modules that have GPU support even for building purposes. Disabling that hook as part of the bot-specific SitePackage.lua seems like a good idea. |
In order to fix similar kind of MPI issues on our zen4 cluster (see #815), I added the following file to the bot account:
|
With help from @casparvl, I've added the following to
/project/def-users/bot/shared/host-injections/2023.06/.lmod/SitePackage.lua
on our AWS build cluster, which will be picked up by the bot for builds relying onlibfabric
:This solves the Haswell OpenMPI issues that we observed in several PRs. I was going to make a PR for it, but I have some doubts on how this should be done:
libfabric
?SitePackage.lua
is picked up / copied to the right location?bot/build.sh
,EESSI-install-software.sh
,eessi_container.sh
, ...?SitePackage.lua
, should it already pick up the new version? If so, we should probably prevent it from being copied to the shared directory already, otherwise other builds will also pick it up already before it's merged.The text was updated successfully, but these errors were encountered: