-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: #41 cgroup confinement #18
Conversation
…kemake-executor-plugin-slurm-jobstep into fix/#41_cgroup_confinement
…kemake-executor-plugin-slurm-jobstep into fix/#41_cgroup_confinement
# the job can utilize the c-group's resources. | ||
# Note, if a job asks for more threads than cpus_per_task, we need to | ||
# limit the number of cpus to the number of threads. | ||
cpus = min(job.resources.get("cpus_per_task", 1), job.threads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be max instead of min, according to your description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: yes @johanneskoester actually, no. Scenarios:
- the
cpus_per_tak
resource setting is set, then it is >=1 and if job.threads is bigger the job cannot utilize it; hence the minimum must be taken or else the job is throttled in the c-group cpus_per_task
is unset, so implicitly 1. The minimum of 1 must be taken (elsejob.threads
would mean oversubscription, which does not work either).- ideally,
cpus_per_taks
andjob.thread
are in no conflict because of the slurm-executor supersedingjob.thread
withcpus_per_task
taken from the resource section. In essence, it would not matter, themin()
function was intended as a precaution as remedy to the aforementioned scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope my edit makes sense ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean that if cpus_per_task are not set, then cpus will always be 1? Sorry if this is a stupid question and you answered it before, but that seems wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. However, threads
will set cpus_per_task
, if cpus_per_task
is not set. See lines 128 to 138 of the submit plugin.
In fact, this whole check is redundant. Should we better erase it altogether and leave a note referring to the submit plugin, I wonder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to my. Why having code here if it is not needed because cpus_per_task is always configured in the right way by the slurm plugin.
not sure, why the test breaks now, will look into into it ... |
|
||
if "mpi" in job.resources.keys(): | ||
|
||
if job.is_group(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was deactivated by me because it neglected the fact that resources within a group will be mostly unknown beforehand, and further the topology does not really properly predict diverging runtimes of jobs, and it is therefore unnecessarily unflexible to group them by topology levels. However, I have not thought this through entirely and I might be wrong with my thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is: right now, group jobs do not work correctly. As far as I see it, there are two places which need editing:
- the submit executor; hence the CPU resources are
threads
(orcpus_per_task
) times number of jobs, confined to a single node, unless we allow oversubscription - the jobstep executor to handle one group job after the other (non-blocking, of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, lets discuss this in a VC :-)
# SLURM's limits: it is not able to limit the memory if we divide the | ||
# job per CPU by itself. | ||
|
||
level_mem = level_job.resources.get("mem_mb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level_mem = level_job.resources.get("mem_mb") | |
level_mem = level_job.resources.get("mem_mb", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, first check if mem_per_cpu is set.
This can likely be closed in favor of #23. |
see snakemake/snakemake-executor-plugin-slurm#41
Will require adapting the snakemake-executor-plugin-slurm, too.