forked from jbarber/maui-admin-guide
-
Notifications
You must be signed in to change notification settings - Fork 0
/
3.2environment.html
223 lines (136 loc) · 15.4 KB
/
3.2environment.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<title></title>
</head>
<body>
<div class="sright">
<div class="sub-content-head">
Maui Scheduler
</div>
<div id="sub-content-rpt" class="sub-content-rpt">
<div class="tab-container docs" id="tab-container">
<div class="topNav">
<div class="docsSearch"></div>
<div class="navIcons topIcons">
<a href="index.html"><img src="home.png" title="Home" alt="Home" border="0"></a> <a href="3.0basics.html"><img src="upArrow.png" title="Up" alt="Up" border="0"></a> <a href="3.1layout.html"><img src="prevArrow.png" title="Previous" alt="Previous" border="0"></a> <a href="3.3jobflow.html"><img src="nextArrow.png" title="Next" alt="Next" border="0"></a>
</div>
<h1>3.2 Scheduling Environment</h1>
<ul>
<li>
<a href="3.2environment.html#SchedulingObject">3.2.1 Scheduling Objects</a>
<ul>
<li>
<a href="3.2environment.html#jobdefinition">3.2.1.1 Jobs</a>
<ul>
<li><a href="3.2environment.html#reqdefinition">3.2.1.1.1 Requirement (or Req)</a></li>
</ul>
</li>
<li><a href="3.2environment.html#nodedefinition">3.2.1.2 Nodes</a></li>
<li><a href="3.2environment.html#reservationdefinition">3.2.1.3 Advance Reservations</a></li>
<li><a href="3.2environment.html#policydefinition">3.2.1.4 Policies</a></li>
<li><a href="3.2environment.html#resourcedefinition">3.2.1.5 Resources</a></li>
<li><a href="3.2environment.html#taskdefinition">3.2.1.6 Task</a></li>
<li><a href="3.2environment.html#PEoverview">3.2.1.7 PE</a></li>
<li><a href="3.2environment.html#classdefinition">3.2.1.8 Class (or Queue)</a></li>
</ul>
</li>
</ul>
<h2><a name="SchedulingObject" id="SchedulingObject"></a>3.2.1 Scheduling Objects</h2>
<p>Maui functions by manipulating five primary, elementary objects. These are jobs, nodes, reservations, QOS structures, and policies. In addition to these, multiple minor elementary objects and composite objects are also utilized. These objects are also defined in the <a href="3.2.1dictionary.html">scheduling dictionary</a>.</p>
<h3><a name="jobdefinition" id="jobdefinition"></a>3.2.1.1 Jobs</h3>
<p>Job information is provided to the Maui scheduler from a resource manager such as Loadleveler, PBS, Wiki, or LSF. Job attributes include ownership of the job, job state, amount and type of resources required by the job, and a wallclock limit, indicating how long the resources are required. A job consists of one or more <a href="3.2environment.html#reqdefinition">requirements</a> each of which requests a number of resources of a given type. For example, a job may consist of two requirements, the first asking for '1 IBM SP node with at least 512 MB of RAM' and the second asking for '24 IBM SP nodes with at least 128 MB of RAM'. Each requirements consists of one or more <a href="3.2environment.html#taskdefinition">tasks</a> where a task is defined as the minimal independent unit of resources. By default, each task is equivalent to one processor. In SMP environments, however, users may wish to tie one or more processors together with a certain amount of memory and/or other resources.</p>
<h4><a name="reqdefinition" id="reqdefinition"></a>3.2.1.1.1 Requirement (or Req)</h4>
<p>A job <i>requirement</i> (or req) consists of a request for a single type of resources. Each requirement consists of the following components:</p>
<p>- <b>Task Definition</b></p>
<p>A specification of the elementary resources which compose an individual task.</p>
<p>- <b>Resource Constraints</b></p>
<p>A specification of conditions which must be met in order for resource <i>matching</i> to occur. Only resources from nodes which meet <b>all</b> resource constraints may be allocated to the job req.</p>
<p>- <b>Task Count</b></p>
<p>The number of task instances required by the req.</p>
<p>- <b>Task List</b></p>
<p>The list of nodes on which the task instances have been located.</p>
<p>- <b>Req Statistics</b></p>
<p>Statistics tracking resource utilization<br></p>
<h3><a name="nodedefinition" id="nodedefinition"></a>3.2.1.2 Nodes</h3>
<p>As far as Maui is concerned, a node is a collection of resources with a particular set of associated attributes. In most cases, it fits nicely with the canonical world view of a node such as a PC cluster node or an SP node. In these cases, a node is defined as one or more CPU's, memory, and possibly other compute resources such as local disk, swap, network adapters, software licenses, etc. Additionally, this node will described by various attributes such as an architecture type or operating system. Nodes range in size from small uniprocessor PC's to large SMP systems where a single node may consist of hundreds of CPU's and massive amounts of memory.</p>
<p>Information about nodes is provided to the scheduler chiefly by the resource manager. Attributes include node state, configured and available resources (i.e., processors, memory, swap, etc.), run classes supported, etc.</p>
<h3><a name="reservationdefinition" id="reservationdefinition"></a>3.2.1.3 Advance Reservations</h3>
<p>An advance reservation is an object which dedicates a block of specific resources for a particular use. Each reservation consists of a list of resources, an access control list, and a time range for which this access control list will be enforced. The reservation prevents the listed resources from being used in a way not described by the access control list during the time range specified. For example, a reservation could reserve 20 processors and 10 GB of memory for users Bob and John from Friday 6:00 AM to Saturday 10:00 PM'. Maui uses advance reservations extensively to manage backfill, guarantee resource availability for active jobs, allow service guarantees, support deadlines, and enable metascheduling. Maui also supports both regularly recurring reservations and the creation of dynamic one time reservations for special needs. Advance reservations are described in detail in the advance reservation overview.</p>
<h3><a name="policydefinition" id="policydefinition"></a>3.2.1.4 Policies</h3>
<p>Policies are generally specified via a config file and serve to control how and when jobs start. Policies include job prioritization, fairness policies, fairshare configuration policies, and scheduling policies.</p>
<h3><a name="resourcedefinition" id="resourcedefinition"></a>3.2.1.5 Resources</h3>
<p>Jobs, nodes, and reservations all deal with the abstract concept of a resource. A resource in the Maui world is one of the following:</p>
<p><b>processors</b></p>
<p>Processors are specified with a simple count value.</p>
<p><b>memory</b></p>
<p>Real memory or 'RAM' is specified in megabytes (MB).</p>
<p><b>swap</b></p>
<p>Virtual memory or 'swap' is specified in megabytes (MB).</p>
<p><b>disk</b></p>
<p>Local disk is specified in megabytes (MB).</p>
<p>In addition to these elementary resource types, there are two higher level resource concepts used within Maui. These are the <i>task</i> and the <i>processor equivalent</i>, or PE.</p>
<h3><a name="taskdefinition" id="taskdefinition"></a>3.2.1.6 Task</h3>
<p>A task is a <i>collection</i> of elementary resources which must be allocated together within a single <a href="3.2environment.html#nodedefinition">node</a>. For example, a task may consist of one processor, 512MB or memory, and 2 GB of local disk. A key aspect of a task is that the resources associated with the task must be allocated as an <i>atomic</i> unit, without spanning node boundaries. A task requesting 2 processors cannot be satisfied by allocating 2 uniprocessor nodes, nor can a task requesting 1 processor and 1 GB of memory be satisfied by allocating 1 processor on one node and memory on another.</p>
<p>In Maui, when jobs or reservations request resources, they do so in terms of tasks typically using a task <i>count</i> and a task <i>definition</i>. By default, a task maps directly to a single processor within a job and maps to a full node within reservations. In all cases, this default definition can be overridden by specifying a new task definition.</p>
<p>Within both jobs and reservations, depending on task definition, it is possible to have multiple tasks from the same job mapped to the same node. For example, a job requesting 4 tasks using the default task definition of 1 processor per task, can be satisfied by two dual processor nodes.</p>
<h3><a name="PEoverview" id="PEoverview"></a>3.2.1.7 PE</h3>
<p>The concept of the processor equivalent, or PE, arose out of the need to translate multi-resource consumption requests into a scalar value. It is not an elementary resource, but rather, a derived resource metric. It is a measure of the actual <i>impact</i> of a set of requested resources by a job on the total resources available system wide. It is calculated as:</p>
<p><tt>PE = MAX(ProcsRequestedByJob / TotalConfiguredProcs,</tt><br>
<tt>MemoryRequestedByJob / TotalConfiguredMemory,</tt><br>
<tt>DiskRequestedByJob / TotalConfiguredDisk,</tt><br>
<tt>SwapRequestedByJob / TotalConfiguredSwap) * TotalConfiguredProcs</tt></p>
<p>For example, say a job requested 20% of the total processors and 50% of the total memory of a 128 processor MPP system. Only two such jobs could be supported by this system. The job is essentially using 50% of all available resources since the system can only be scheduled to its most constrained resource, in this case memory. The processor equivalents for this job should be 50% of the processors, or PE = 64.</p>
<p>Let's make the calculation concrete with one further example. Assume a homogeneous 100 node system with 4 processors and 1 GB of memory per node. A job is submitted requesting 2 processors and 768 MB of memory. The PE for this job would be calculated as:</p>
<p><tt>PE = MAX(2/(100*4), 768/(100*1024)) * (100*4) = 3.</tt></p>
<p>This result makes sense since the job would be consuming 3/4 of the memory on a 4 processor node.</p>
<p>The calculation works equally well on homogeneous or heterogeneous systems, uniprocessor or large way SMP systems.</p>
<h3><a name="classdefinition" id="classdefinition"></a>3.2.1.8 Class (or Queue)</h3>
<p>A class (or queue) is a logical container object which can be used to implicitly or explicitly apply policies to jobs. In most cases, a class is defined and configured within the resource manager and associated with one or more of the following attributes or constraints:<br></p>
<table border width="100%" nosave="">
<tr>
<td><b>Attribute</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>Default Job Attributes</td>
<td>A queue may be associated with a default job duration, default size, or default resource requirements</td>
</tr>
<tr>
<td>Host Constraints</td>
<td>A queue may constrain job execution to a particular set of hosts</td>
</tr>
<tr nosave="">
<td nosave="">Job Constraints</td>
<td>A queue may constrain the attributes of jobs which may submitted including setting limits such as max wallclock time, minimum number of processors, etc.</td>
</tr>
<tr>
<td>Access List</td>
<td>A queue may constrain who may submit jobs into it based on user lists, group lists, etc.</td>
</tr>
<tr>
<td>Special Access</td>
<td>A queue may associate special privileges with jobs including adjusted job priority.</td>
</tr>
</table>
<p>As stated previously, most resource managers allow full class configuration within the resource manager. Where additional class configuration is required, the <a href="a.fparameters.html#classcfg">CLASSCFG</a> parameter may be used.</p>
<p>Maui tracks class usage as a consumable resource allowing sites to limit the number of jobs using a particular class. This is done by monitoring <i>class initiators</i> which may be considered to be a ticket to run in a particular class. Any compute node may simultaneously support several types of classes and any number of initiators of each type. By default, nodes will have a one-to-one mapping between class initiators and configured processors. For every job <i>task</i> run on the node, one class initiator of the appropriate type is consumed. For example, a 3 processor job submitted to the class batch will consume three batch class initiators on the nodes where it is run.</p>
<p>Using queues as consumable resources allows sites to specify various policies by adjusting the class initiator to node mapping. For example, a site running serial jobs may want to allow a particular 8 processor node to run any combination of batch and special jobs subject to the following constraints:</p>
<p>- only 8 jobs of any type allowed simultaneously<br>
- no more than 4 special jobs allowed simultaneously</p>
<p>To enable this policy, the site may set the node's <b>MAXJOB</b> policy to 8 and configure the node with 4 special class initiators and 8 batch class initiators.</p>
<p>Note that in virtually all cases jobs have a one-to-one correspondence between processors requested and class initiators required. However, this is not a requirement and, with special configuration sites may choose to associate job tasks with arbitrary combinations of class initiator requirements.</p>
<p>In displaying class initiator status, Maui signifies the type and number of class initiators available using the format [<CLASSNAME>:<CLASSCOUNT>]. This is most commonly seen in the output of node status commands indicating the number of configured and available class initiators, or in job status commands when displaying class initiator requirements.</p>
<p><b>Arbitrary Resource</b></p>
<p>Node can also be configured to support various 'arbitrary resources'. Information about such resources can be specified using the <a href="a.fparameters.html#nodecfg">NODECFG</a> parameter. For example, a node may be configured to have '256 MB RAM, 4 processors, 1 GB Swap, and 2 tape drives'.</p>
<div class="navIcons bottomIcons">
<a href="index.html"><img src="home.png" title="Home" alt="Home" border="0"></a> <a href="3.0basics.html"><img src="upArrow.png" title="Up" alt="Up" border="0"></a> <a href="3.1layout.html"><img src="prevArrow.png" title="Previous" alt="Previous" border="0"></a> <a href="3.3jobflow.html"><img src="nextArrow.png" title="Next" alt="Next" border="0"></a>
</div>
</div>
</div>
</div>
<div class="sub-content-btm"></div>
</div>
</body>
</html>