Skip to content

Commit

Permalink
Merge pull request #910 from dongahn/resrc-conf
Browse files Browse the repository at this point in the history
Add TOML config support for sched-fluxion-resource
  • Loading branch information
mergify[bot] authored Mar 2, 2022
2 parents 20337b0 + 6880059 commit f8f2064
Show file tree
Hide file tree
Showing 43 changed files with 1,769 additions and 281 deletions.
1 change: 1 addition & 0 deletions doc/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ SUBDIRS = . test

MAN5_FILES = \
man5/flux-config-sched-fluxion-qmanager.5
man5/flux-config-sched-fluxion-resource.5

RST_FILES = \
$(MAN5_FILES:.5=.rst)
Expand Down
108 changes: 95 additions & 13 deletions doc/man5/flux-config-sched-fluxion-qmanager.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,97 @@ DESCRIPTION
===========

The ``sched-fluxion-qmanager`` configuration table may be used
to tune the policies and parameters for the Fluxion graph-based
scheduler.
to tune the queuing policies and parameters
for the Fluxion graph-based scheduler.

This table may contain the following keys:

KEYS
====

queue-policy
(optional) String name of queueing policy to use (e.g. fcfs).

queue-params
(optional) Comma separated list of queue parameters.

policy-params
(optional) Comma separated list of policy paramters.
(optional) String name of queuing policy to use. The
supported policies are described in
the :ref:`queue_policies` section. The default is "fcfs".


The following keys in the optional ``[queue-params]`` table can be
used to tune the general queuing parameters.

queue-params.max-queue-depth
(optional) Positive integer value that sets the maximum number of pending
jobs that can be considered per scheduling cycle.
The default is 1000000.

queue-params.queue-depth
(optional) Positive integer value that limits the number of pending
jobs to consider per scheduling cycle. The default is 32.
If it is larger than ``queue-params.max-queue-depth``, it is set to
``queue-params.max-queue-depth`` instead.


The following keys in the optional ``[policy-params]`` table can be
used to tune the parameters of certain queuing policies.

policy-params.max-reservation-depth
(optional) Only applied to the ``conservative`` or ``hybrid`` policy
that must compute the minimum start time of running
the higher-priority pending jobs that cannot be run
due to currently insufficient resources.
Positive integer value that sets the maximum number of
such higher-priority pending jobs to consider
per scheduling cycle. The default is 100000.

policy-params.reservation-depth
(optional) Only applied to the ``hybrid`` policy
that must compute the minimum start time of running
higher-priority pending jobs that cannot be run
due to currently insufficient resources.
Positive integer value that limits the number of
such higher-priority pending jobs to consider
per scheduling cycle. The default is 64.
If it is larger than ``policy-params.max-reservation-depth``,
it is set to ``policy-params.max-reservation-depth`` instead.


.. _queue_policies:

QUEUING POLICIES
=================

fcfs
First come, first served policy if the priority of
pending jobs are same: i.e., jobs are scheduled
and run by their submission order. If pending jobs
have different priorities, they are serviced
by their priority order.

easy
EASY-backfilling policy: If the highest-priority
pending job cannot be run with ``fcfs`` because
its requested resources are currently unavailable,
one or more next high priority jobs will be
scheduled and run as far as this will not delay
the start time of running the highest-priority job.

conservative
CONSERVATIVE-backfilling policy: Similarly to ``easy``,
pending jobs can run out of order when the highest-priority
job cannot run because its requested resources
are currently unavailable. However, this policy
is more conservative as a lower priority job can only
be backfilled and run if and only if this will
not delay the start time of running any pending job
whose priority is higher than the backfilling job.

hybrid
HYBRID-backfilling policy: This is an optimization
of ``conservative`` where a lower priority job can only
be backfilled and run if and only if this will
not delay the start time of running N pending jobs
whose priority is higher than the backfilling job.
N can be configured by the ``policy-params.reservation-depth``
parameter: see ``policy-params.reservation-depth``


EXAMPLE
Expand All @@ -31,14 +106,21 @@ EXAMPLE

[sched-fluxion-qmanager]

# queueing policy type
queue-policy = "fcfs"
# queuing policy type
queue-policy = "hybrid"

# general queue parameters
queue-params = "queue-depth=8192,max-queue-depth=1000000"
[sched-fluxion-qmanager.queue-params]

max-queue-depth = 1000000
queue-depth = 8192

# queue policy parameters
policy-params = "reservation-depth=64,max-reservation-depth=100000"
[sched-fluxion-qmanager.policy-params]

max-reservation-depth = 100000
reservation-depth = 64


RESOURCES
=========
Expand Down
135 changes: 135 additions & 0 deletions doc/man5/flux-config-sched-fluxion-resource.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
=====================================
flux-config-sched-fluxion-resource(5)
=====================================

DESCRIPTION
===========

The ``sched-fluxion-resource`` configuration table may be used
to tune the resource match policies and parameters
for the Fluxion graph-based scheduler.

This table may contain the following keys:

KEYS
====

match-policy
(optional) String name of match policy to use. The supported
match polices are described in the :ref:`match_policies` section.
The default is "first".

match-format
(optional) String name of match format to use.
"rv1" and "rv1_nosched" are currently supported.
When a job is allocated, its resource set is encoded
in RFC 20 Resource Set Specification Version 1.
It has an optional ``scheduling`` key and this is
only encoded by "rv1".
By omitting the ``scheduling`` key, "rv1_nosched" will
result in higher scheduling performance. However,
this format will not contain sufficient
information to reconstruct the state
of ``sched-fluxion-resource`` on module reload (as
required for system instance failure recovery).
The default is "rv1_nosched".

load-allowlist
(optional) Comma-separated list of resource types to load
with the ``hwloc`` reader.
When Flux is instantiated in single-user mode
with a foreign workload manager (e.g., IBM LSF, SLURM, etc),
``sched-fluxion-resource`` can discover the target resources
by using ``hwloc``. This list allows ``sched-fluxion-resource``
to load only the resources of the specified types
from our ``hwloc`` reader as needed for scheduling.

reserve-vtx-vec
(optional) Integer value that reserves memory to store
the specified number of graph vertices in order
to optimize resource-graph loading performance.
Recommended for handling large-scale systems.
The value must be a non-zero integer up to 2000000.

prune-filters
(optional) Comma-separated list of graph-search filters
to accelerate match operations. Each filter is
expressed such that a High-Level (HL) resource
vertex can track the aggregate state of the Low-Level (LL) resources
residing under its subtree graph.
If a jobspec requests 1 compute node with 4 cores, and the visiting
compute-node vertex has only a total of 2 available cores
in aggregate at its subtree, this filter allows the traverser
to prune a further descent to accelerate the search.
The format must conform to
``<HL-resource1:LL-resource1[,HL-resource2:LL-resource2...]...]>``.
Use the ``ALL`` keyword for HL-resource if you want LL-resource
to be tracked at all of its ancestor HL-resource vertices.
The default is "ALL:core".


.. _match_policies:

RESOURCE MATCH POLICIES
=======================

low
Select resources with low ID first (e.g., core0 is selected
first before core1 is selected).

high
Select resources with high ID first (e.g., core15 is selected
first before core14).

lonode
Select resources with lowest compute-node ID first; otherwise
the ``low`` policy (e.g., for node-local resource types).

hinode
Select resources with highest compute-node ID first; otherwise
the ``high`` policy (e.g., for node-local resource types).

lonodex
A node-exclusive scheduling whose behavior is
identical to ``lonode`` except each compute node
is exclusively allocated.

hinodex
A node-exclusive scheduling whose behavior is
identical to ``hinode`` except each compute node
is exclusively allocated.

first
Select the first matching resources and stop the search


EXAMPLE
=======

::

[sched-fluxion-resource]

# system instance will use node-exclusive
# scheduling (with nodes of low node IDs
# selected first).
match-policy = "lonodex"

# system-instance will use full-up rv1 writer
# so that R will contain scheduling key needed
# for failure recovery.
match-format = "rv1"


RESOURCES
=========

Flux: http://flux-framework.org

RFC 20: Resource Set Specification Version 1: https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_20.html

SEE ALSO
========

:core:man5:`flux-config`

1 change: 1 addition & 0 deletions doc/man5/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ man5
:maxdepth: 1

flux-config-sched-fluxion-qmanager
flux-config-sched-fluxion-resource
1 change: 1 addition & 0 deletions doc/manpages.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@
# - Manual section
man_pages = [
('man5/flux-config-sched-fluxion-qmanager', 'flux-config-sched-fluxion-qmanager', 'Fluxion qmanager configuration file', [author], 5),
('man5/flux-config-sched-fluxion-resource', 'flux-config-sched-fluxion-resource', 'Fluxion resource configuration file', [author], 5),
]
4 changes: 3 additions & 1 deletion etc/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@ dist_fluxrc1_SCRIPTS = \
dist_fluxrc3_SCRIPTS = \
02-sched-fluxion-resource-stop \
01-sched-fluxion-qmanager-stop
EXTRA_DIST = sched-fluxion-qmanager.toml
EXTRA_DIST = \
sched-fluxion-qmanager.toml \
sched-fluxion-resource.toml

12 changes: 9 additions & 3 deletions etc/sched-fluxion-qmanager.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,21 @@
[sched-fluxion-qmanager]

# queueing policy type
queue-policy = "fcfs"
queue-policy = "easy"

# general queue parameters
# queue-depth (applied to all policies)
# max queue depth (applied to all policies)
queue-params = "queue-depth=8192,max-queue-depth=1000000"
[sched-fluxion-qmanager.queue-params]

queue-depth = 8192
max-queue-depth = 1000000

# queue policy parameters
# max depth for "conservative" and "hybrid"
# reservation depth for HYBRID
policy-params = "reservation-depth=64,max-reservation-depth=100000"
[sched-fluxion-qmanager.policy-params]

reservation-depth = 64
max-reservation-depth = 100000

17 changes: 17 additions & 0 deletions etc/sched-fluxion-resource.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[sched-fluxion-resource]

# support for node, core and gpu type if hwloc is used
load-allowlist="node,core,gpu"

# system instance will use node-exclusive
# scheduling (with nodes of low node IDs
# selected first).
match-policy = "lonodex"

# system-instance will use full-up rv1 writer
# so that R will contain scheduling key needed
# for failure recovery.
match-format = "rv1"

prune-filters="ALL:core"

Loading

0 comments on commit f8f2064

Please sign in to comment.