Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accelerator detection to Lmod version of EESSI initialisation #781

Merged
merged 35 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
887bc8a
Add accelerator detection to Lmod version of EESSI initialisation
ocaisa Oct 11, 2024
2c7553b
Make a separate PR for the lmod wrapper
ocaisa Oct 11, 2024
f4ee9d4
Merge remote-tracking branch 'upstream/2023.06-software.eessi.io' int…
ocaisa Oct 15, 2024
cf5491b
Add accelerator check to Lmod init script
ocaisa Oct 15, 2024
e7eb879
Match all variables that start with EESSI_
ocaisa Oct 15, 2024
1311055
Lmod init uses archdetect, but init script still uses archspec
ocaisa Oct 15, 2024
d4d5a79
init script is the only one who sets EESSI_ARCHDETECT EESSI_ARCHSPEC
ocaisa Oct 15, 2024
c7d4230
Use the correct override for when GPUs are expected
ocaisa Oct 15, 2024
5fd01f8
Add a load/unload test to make sure we return the environment to it's…
ocaisa Oct 15, 2024
7abcdc3
Try to make module work in load and unload modes
ocaisa Oct 15, 2024
e4ac2fa
Fix lots of small issues
ocaisa Oct 15, 2024
3829791
Fix lots of small issues
ocaisa Oct 15, 2024
443cdef
Fix lots of small issues
ocaisa Oct 15, 2024
78293e6
Use debug feature for module a bit more
ocaisa Oct 15, 2024
fa10c1a
Correct comment in action
ocaisa Oct 15, 2024
9c4897c
Add more debug logging
ocaisa Oct 16, 2024
86e0c4f
Add debug logging to CI
ocaisa Oct 16, 2024
87c6cdf
Add LMOD_RC and LMOD_PACKAGE_PATH to our checks
ocaisa Oct 16, 2024
9132662
Use the same regex everywhere
ocaisa Oct 16, 2024
b8c311d
zen4 is currently an exception, let's explicitly add it to the tests
ocaisa Oct 16, 2024
fe6a7de
Allow overriding to use Zen4
ocaisa Oct 16, 2024
6d64c42
Add comment to loop for clarification
ocaisa Oct 16, 2024
7e4f901
Add comments with examples in lots of places
ocaisa Oct 16, 2024
089158a
Apply suggestions from code review
ocaisa Oct 16, 2024
bea886b
Apply suggestions from code review
ocaisa Oct 16, 2024
6e1ccaa
Add more cases to load/unload tests, and check full env
ocaisa Oct 16, 2024
84d9f1d
Make sure Lmod has run once before storing environment settings
ocaisa Oct 16, 2024
56518ea
Don't expose full environment in CI
ocaisa Oct 16, 2024
9465289
Try harder to initialise Lmod before checking load/unload
ocaisa Oct 16, 2024
e937d5b
Make sure a module reset does not return an error
ocaisa Oct 16, 2024
fb59a55
Try another way to reset Lmod
ocaisa Oct 16, 2024
c02d894
Add a fake module so we can initialise Lmod somehow
ocaisa Oct 16, 2024
6b9d980
Keep trying to figure out how to control Lmod
ocaisa Oct 16, 2024
834ce38
Ignore Lmod tables when doing environment comparison
ocaisa Oct 16, 2024
04c2573
Update 2023.06.lua
ocaisa Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions init/lmod_eessi_archdetect_wrapper_accel.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# This can be leveraged by the source_sh() feature of Lmod
export EESSI_ACCEL_SUBDIR=$($(dirname $(readlink -f $BASH_SOURCE))/eessi_archdetect.sh accelpath)
35 changes: 32 additions & 3 deletions init/modules/EESSI/2023.06.lua
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,27 @@ function archdetect_cpu()
end
LmodError("Software directory check for the detected architecture failed")
end
function archdetect_accel()
local script = pathJoin(eessi_prefix, 'init', 'lmod_eessi_archdetect_wrapper_accel.sh')
casparvl marked this conversation as resolved.
Show resolved Hide resolved
if not os.getenv("EESSI_ACCEL_SUBDIR") then
if convertToCanonical(LmodVersion()) < convertToCanonical("8.6") then
LmodError("Loading this modulefile requires using Lmod version >= 8.6, but you can export EESSI_ACCEL_SUBDIR to the available accelerator architecture in the form of: accel/nvidia/cc80")
end
source_sh("bash", script)
end
local archdetect_accel = os.getenv("EESSI_ACCEL_SUBDIR") or ""
return archdetect_accel
end
local archdetect = archdetect_cpu()
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local archdetect_accel = archdetect_accel()
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local eessi_cpu_family = archdetect:match("([^/]+)")
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local eessi_software_subdir = archdetect
local eessi_eprefix = pathJoin(eessi_prefix, "compat", eessi_os_type, eessi_cpu_family)
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local eessi_software_path = pathJoin(eessi_prefix, "software", eessi_os_type, eessi_software_subdir)
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local eessi_module_path = pathJoin(eessi_software_path, "modules", "all")
local eessi_modules_subdir = pathJoin("modules", "all")
local eessi_module_path = pathJoin(eessi_software_path, eessi_modules_subdir)
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
local eessi_site_software_path = string.gsub(eessi_software_path, "versions", "host_injections")
local eessi_site_module_path = pathJoin(eessi_site_software_path, "modules", "all")
local eessi_site_module_path = pathJoin(eessi_site_software_path, eessi_modules_subdir)
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
setenv("EPREFIX", eessi_eprefix)
setenv("EESSI_CPU_FAMILY", eessi_cpu_family)
setenv("EESSI_SITE_SOFTWARE_PATH", eessi_site_software_path)
Expand All @@ -65,8 +78,24 @@ if ( mode() ~= "spider" ) then
prepend_path("MODULEPATH", eessi_module_path)
end
prepend_path("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua"))
prepend_path("MODULEPATH", eessi_site_module_path)
setenv("LMOD_PACKAGE_PATH", pathJoin(eessi_software_path, ".lmod"))

-- the accelerator may have an empty value and we need to give some flexibility
-- * construct the path we expect to find
-- * then check it exists
-- * then update the modulepath
if not (archdetect_accel == nil or archdetect_accel == '') then
eessi_accel_software_subdir = os.getenv("EESSI_ACCEL_SOFTWARE_SUBDIR_OVERRIDE") or eessi_software_subdir
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
eessi_accel_software_path = pathJoin(eessi_prefix, "software", eessi_os_type, eessi_accel_software_subdir)
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
eessi_module_path_accel = pathJoin(eessi_accel_software_path, eessi_accel_software_subdir, eessi_modules_subdir)
if isDir(eessi_modulepath_accel) then
setenv("EESSI_MODULEPATH_ACCEL", eessi_module_path_accel)
prepend_path("MODULEPATH", eessi_module_path_accel)
end
end

-- prepend the site module path last so it has priority
prepend_path("MODULEPATH", eessi_site_module_path)
if mode() == "load" then
LmodMessage("EESSI/" .. eessi_version .. " loaded successfully")
end