Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.2] Add support for Cgroupsv2 #17169

Merged
merged 3 commits into from
Dec 14, 2023
Merged

Conversation

natefoo
Copy link
Member

@natefoo natefoo commented Dec 11, 2023

Add support for Cgroupsv2 and hopefully fix other cases where the cgroups post-job commands could affect the return code. We should probably still restore the behavior that caused the job script exit code to be the tool script exit code, but I can't find that issue or PR currently.

Couple things for discussion:

  • The values of memory.high and memory.max will probably be the string max unless your DRM sets them, which means that these will appear in job_metric_text, but if suddenly they started being set to non-default values, they would appear in job_metric_numeric. Probably a very unlikely scenario to matter though.

  • The text for display in the UI for things that people probably don't care about and will never be non-default like memory.events.low is awfully verbose. I wonder if the default set of cgroupsv2 metrics ought to just be these:

      "memory.events.oom_kill": "Number of processes belonging to this cgroup killed by any kind of OOM killer",
      "memory.peak": "Max memory usage recorded",
      "cpu.stat.system_usec": "CPU system time (seconds)",
      "cpu.stat.usage_usec": "CPU usage time (seconds)",
      "cpu.stat.user_usec": "CPU user time (seconds)",

    because I am unlikely to care about anything else, but maybe other people have opinions. Here are the options for reference. memory.max and memory.high (currently included) could be interesting if Slurm ever set them, but as far as I know it doesn't. I'd be interested to know if these are anything useful in Kubernetes or HTCondor jobs.

Fixes #15924

When the next galaxy-job-metrics release happens we should also update the pin in Pulsar and release a new Pulsar version.

How to test the changes?

  • This is a refactoring of components with existing test coverage. (hopefully?)

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@github-actions github-actions bot added this to the 23.2 milestone Dec 11, 2023
@nsoranzo
Copy link
Member

Can you run make format to fix the linting please?

@natefoo
Copy link
Member Author

natefoo commented Dec 11, 2023

Can you run make format to fix the linting please?

You're saying we finally managed to implement @nsoranzo as a Makefile target? What are you doing with all your newfound free time?

Copy link
Contributor

@kysrpex kysrpex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @natefoo,

I mainly had a look both at the defaults you propose and at lines 33 to 49 from the diff.

I think what you propose as defaults is the information I am most likely to care about too.

Also from this line, I infer that any metric that cgroups can handle can be recorded by Galaxy, regardless of whether it is in TITLES or CONVERSION.

I am mentioning this because I think that under certain circumstances I can picture myself looking at IO information to try to make sense of why tool execution may be slow (for example, to help diagnosing slow upload jobs, or when taking walltime into the mix to have an intuition of whether a tool is CPU or IO bound). In particular I find io.stat and io.pressure interesting.

If you think it makes sense, we can add a description for io.stat.*, io.pressure, io.max and io.weight. I think we need some conversions as well because CgroupPluginFormatter seems to cover only things ending with "_bytes".

lib/galaxy/config/sample/job_metrics_conf.xml.sample Outdated Show resolved Hide resolved
@mvdbeek
Copy link
Member

mvdbeek commented Dec 12, 2023

Can you target 23.2 please ? This has the potential to break running jobs, and that's too big a change to apply to a stable release IMO.

@natefoo
Copy link
Member Author

natefoo commented Dec 12, 2023

Thanks for the detailed review @kysrpex!

I think what you propose as defaults is the information I am most likely to care about too.

Thanks, I'll adjust the defaults accordingly.

Also from this line, I infer that any metric that cgroups can handle can be recorded by Galaxy, regardless of whether it is in TITLES or CONVERSION.

That's right, with verbose="true" or individually with params="val1,val2".

I am mentioning this because I think that under certain circumstances I can picture myself looking at IO information to try to make sense of why tool execution may be slow (for example, to help diagnosing slow upload jobs, or when taking walltime into the mix to have an intuition of whether a tool is CPU or IO bound). In particular I find io.stat and io.pressure interesting.

If you think it makes sense, we can add a description for io.stat.*, io.pressure, io.max and io.weight. I think we need some conversions as well because CgroupPluginFormatter seems to cover only things ending with "_bytes".

The things in TITLES control the default, but we can make a separate list for some of these so that if you do collect them, we display a nice title, The default case in the formatter just passes back the value unchanged, and there's also a case to change floats into ints if their int value is equivalent to their float value.

@kysrpex
Copy link
Contributor

kysrpex commented Dec 12, 2023

The things in TITLES control the default, but we can make a separate list for some of these so that if you do collect them, we display a nice title, The default case in the formatter just passes back the value unchanged, and there's also a case to change floats into ints if their int value is equivalent to their float value.

I'd say let's then add the separate list and the formatter functions under CONVERSION. Even if you enabled edits to maintainers, I could not do it myself, but I can always open a PR to merge a branch from my own galaxy fork into natefoo:cgroupsv2. Just let me know if you want me to/need me to/it is more convenient when I do it.

@natefoo natefoo changed the base branch from release_23.1 to release_23.2 December 12, 2023 16:03
@natefoo natefoo changed the title [23.1] Add support for Cgroupsv2 [23.2] Add support for Cgroupsv2 Dec 12, 2023
@natefoo natefoo marked this pull request as draft December 12, 2023 16:04
@natefoo
Copy link
Member Author

natefoo commented Dec 12, 2023

It currently won't pull anything from the io controller, and even if we adjusted the shell commands to collect it, it would fail in my case: for example, the io.* interfaces don't exist at that level of the hierarchy for Slurm jobs or even my ssh sessions:

[rocky@js2-quad3 ~]$ cat /proc/$$/cgroup
0::/system.slice/slurmstepd.scope/job_1174091/step_0/user/task_0
[rocky@js2-quad3 ~]$ ls /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1174091/step_0/user/task_0
cgroup.controllers	cgroup.stat		cpu.stat	       cpuset.mems.effective  memory.min	   memory.swap.events
cgroup.events		cgroup.subtree_control	cpu.weight	       memory.current	      memory.numa_stat	   memory.swap.high
cgroup.freeze		cgroup.threads		cpu.weight.nice        memory.events	      memory.oom.group	   memory.swap.max
cgroup.kill		cgroup.type		cpuset.cpus	       memory.events.local    memory.peak	   memory.zswap.current
cgroup.max.depth	cpu.idle		cpuset.cpus.effective  memory.high	      memory.reclaim	   memory.zswap.max
cgroup.max.descendants	cpu.max			cpuset.cpus.partition  memory.low	      memory.stat
cgroup.procs		cpu.max.burst		cpuset.mems	       memory.max	      memory.swap.current

You have to go all the way up to /sys/fs/cgroup/system.slice/slurmstepd.scope (Slurm) or /sys/fs/cgroup/user.slice (ssh) and that's system-wide rather than scoped to something useful. Is the case different for you?

I don't think any formatter conversions are missing - if so, let me know which (or feel free to PR).

@natefoo natefoo marked this pull request as ready for review December 13, 2023 20:28
@kysrpex
Copy link
Contributor

kysrpex commented Dec 14, 2023

It currently won't pull anything from the io controller, and even if we adjusted the shell commands to collect it, it would fail in my case: for example, the io.* interfaces don't exist at that level of the hierarchy for Slurm jobs or even my ssh sessions:

[rocky@js2-quad3 ~]$ cat /proc/$$/cgroup
0::/system.slice/slurmstepd.scope/job_1174091/step_0/user/task_0
[rocky@js2-quad3 ~]$ ls /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1174091/step_0/user/task_0
cgroup.controllers	cgroup.stat		cpu.stat	       cpuset.mems.effective  memory.min	   memory.swap.events
cgroup.events		cgroup.subtree_control	cpu.weight	       memory.current	      memory.numa_stat	   memory.swap.high
cgroup.freeze		cgroup.threads		cpu.weight.nice        memory.events	      memory.oom.group	   memory.swap.max
cgroup.kill		cgroup.type		cpuset.cpus	       memory.events.local    memory.peak	   memory.zswap.current
cgroup.max.depth	cpu.idle		cpuset.cpus.effective  memory.high	      memory.reclaim	   memory.zswap.max
cgroup.max.descendants	cpu.max			cpuset.cpus.partition  memory.low	      memory.stat
cgroup.procs		cpu.max.burst		cpuset.mems	       memory.max	      memory.swap.current

You have to go all the way up to /sys/fs/cgroup/system.slice/slurmstepd.scope (Slurm) or /sys/fs/cgroup/user.slice (ssh) and that's system-wide rather than scoped to something useful. Is the case different for you?

I don't think any formatter conversions are missing - if so, let me know which (or feel free to PR).

I don't think I have a time slot to check this out very soon. If you want to merge without this feature go ahead, I do not want to block you.

@natefoo natefoo merged commit faaff5e into galaxyproject:release_23.2 Dec 14, 2023
44 of 46 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing highlight/admin Included in admin/dev release notes kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants