-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failing test step on AWS #678
Fix failing test step on AWS #678
Conversation
Instance
|
Instance
|
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Interactively, I got the correct output, from file I'll try to manually submit some batch jobs on AWS to figure this out. It's tricky that we don't see this interactively... My bet is, it is due to the replacement with |
Very strange, this small test job just works correctly:
WIth output:
|
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
…sible that inside the container, the /sys directory is different
The difference is that we are in a container... so we should use the info from the mounted directories of the host, not from the container's |
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
…it from the containers' /proc is fine. So let's do that
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software arch:zen3 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software arch:zen4 |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell |
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine overall. Added a little question if the script could have made a little 'smarter' (using /sys
or /hostsys
) and a suggestion to explain a bit what this /hostsys
is (and where it comes from).
cgroup_v1_mem_limit="/hostsys/fs/cgroup/memory/$(</proc/self/cpuset)/memory.limit_in_bytes" | ||
cgroup_v2_mem_limit="/hostsys/fs/cgroup/$(</proc/self/cpuset)/memory.max" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path probably makes only sense if run in a very specific environment (e.g., testing software built for EESSI). While this is fine, how about checking whether /sys
or /hostsys
is available and use that?
If there would be a comment that explains what /hostsys
is and how it is made available, it might make debugging a little easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we bind-mount this additional path in bot/test.sh
. You're absolutely right about the commenting part: I'll make that clear.
Regarding a fallback on /sys
, I'm not sure if we want to do that. If /hostsys
isn't there, it means the bind-mount failed / was not executed. I'd probably prefer there to be a hard error, than a silent success here that maybe extracts the wrong amount of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added the description now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Makes sense that we use /hostsys
and don't fall back to /sys
.
Currently, the test step on AWS fails because we fail to get a memory limit from the cgroup. I'll add some more verbose output as a first step to debugging this.