-
Notifications
You must be signed in to change notification settings - Fork 0
/
FAN.sh
executable file
·238 lines (189 loc) · 7.15 KB
/
FAN.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
#!/bin/bash
:<<++++
Author: Tim Kaiser
This script shows how to use slurm dependencies to build
complex workflows.
(1) Starts with 5 jobs that need to run in sequence.
(2) Submits 4 jobs that will wait for the previous 5. However,
these 4 jobs can run at the same time.
(3) Finally a single job is run that depends or the previous 4.
The dependency graph is:
job1
job2
job3
job4
job5
/ / \ \
job6 job7 job8 job9
\ \ / /
job10
The way it works is that it grabs the JOBIDs returned by sbatch
and uses then as dependencies. For jobs[2-5] this is easy since
there is a single dependency. For jobs[6-8] we collect dependencies
in the string myset1. We note here that we could have made jobs[6-8]
only dependent on job5 since if job5 finishes we know the others have
completed.
For job10 we collect the dependency list in the string myset2.
The slurm script we are running is old_new.sh. It can use the
variables OLD_DIR and NEW_DIR to specify directories from which
to get data from a previous run and where to do the current run.
If OLD_DIR is defined old_new.sh will copy all files from that directory
to NEW_DIR. There is also a variable, OLD_FILES, not used here that
can be used to only copy specific files.
For the first 5 jobs data goes into directories ser[1-5]. The next 4,
par[1-4] and the final in directory final.
Usage:
./FAN.sh account
++++
if [ -z ${1+x} ]; then
echo USAGE:
echo $0 account
echo Your account needs to be set on the command line
exit
fi
export ACC=$1
# Here is the script we will run
export SCRIPT=old_new.sh
rm -rf ser* par* final*
echo "Starts with 5 jobs that need to run in sequence."
unset OLD_DIR
unset OLD_FILES
export NEW_DIR=ser1
jid=`sbatch -J slurm_test --partition=short -A $ACC $SCRIPT | awk '{print $NF }'`
echo $jid
myset1=""
for job in ser2 ser3 ser4 ser5 ; do
export OLD_DIR=$NEW_DIR
export NEW_DIR=$job
echo --dependency=afterok:$jid
jid=`sbatch -J slurm_test --partition=short -A $ACC --dependency=afterok:$jid $SCRIPT | awk '{print $NF }'`
echo $jid
myset1=$myset1,afterok:$jid
done
myset1=`echo $myset1 | sed "s/,//"`
echo $myset1
echo "Now 4 jobs that will wait for the previous 5,"
echo "however, these are independent of each other."
myset2=""
export OLD_DIR=$NEW_DIR
for job in par1 par2 par3 par4 ; do
export NEW_DIR=$job
echo --dependency=$myset1
jid=`sbatch -J slurm_test --partition=short -A $ACC --dependency=$myset1 $SCRIPT | awk '{print $NF }'`
echo $jid
myset2=$myset2,afterok:$jid
done
myset2=`echo $myset2 | sed "s/,//"`
echo $myset2
echo "Finally a single job that waits for the previous 4"
getfiles=`echo $myset2 | sed "s/,/ /g"`
getfiles=`echo $getfiles | sed "s/afterok://g"`
echo $getfiles
unset OLD_DIR
export NEW_DIR=final
export OLD_FILES=$getfiles
jid=`sbatch -J slurm_test --partition=short -A $ACC --dependency=$myset2 $SCRIPT | awk '{print $NF }'`
echo $jid
echo +-+-+- REPORT OF PENDING JOB DEPENDECIES +-+-+-
# get a list of jobs and show Dependencies
squeue -u $LOGNAME
for jid in `squeue -h -u $LOGNAME | awk '{print $1}'` ; do
echo job $jid
scontrol show jobid -dd $jid | grep Dependency
done
:<<++++
Example output
el2:collect> ./FAN hpcapps
Start with 5 jobs that need to run in sequence.
5400356
--dependency=afterok:5400356
5400357
--dependency=afterok:5400357
5400358
--dependency=afterok:5400358
5400359
--dependency=afterok:5400359
5400360
afterok:5400357,afterok:5400358,afterok:5400359,afterok:5400360
Now 4 jobs that will wait for the previous 5,
however, these are independent of each other.
--dependency=afterok:5400357,afterok:5400358,afterok:5400359,afterok:5400360
5400361
--dependency=afterok:5400357,afterok:5400358,afterok:5400359,afterok:5400360
5400362
--dependency=afterok:5400357,afterok:5400358,afterok:5400359,afterok:5400360
5400363
--dependency=afterok:5400357,afterok:5400358,afterok:5400359,afterok:5400360
5400364
afterok:5400361,afterok:5400362,afterok:5400363,afterok:5400364
Finally a single job that waits for the previous 4
5400361 5400362 5400363 5400364
5400365
+-+-+- REPORT OF PENDING JOB DEPENDECIES +-+-+-
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5400356 short atest tkaiser2 PD 0:00 1 (None)
5400357 short atest tkaiser2 PD 0:00 1 (Dependency)
5400358 short atest tkaiser2 PD 0:00 1 (Dependency)
5400359 short atest tkaiser2 PD 0:00 1 (Dependency)
5400360 short atest tkaiser2 PD 0:00 1 (Dependency)
5400361 short atest tkaiser2 PD 0:00 1 (Dependency)
5400362 short atest tkaiser2 PD 0:00 1 (Dependency)
5400363 short atest tkaiser2 PD 0:00 1 (Dependency)
5400364 short atest tkaiser2 PD 0:00 1 (Dependency)
5400365 short atest tkaiser2 PD 0:00 1 (Dependency)
job 5400356
JobState=PENDING Reason=None Dependency=(null)
job 5400357
JobState=PENDING Reason=Dependency Dependency=afterok:5400356(unfulfilled)
job 5400358
JobState=PENDING Reason=Dependency Dependency=afterok:5400357(unfulfilled)
job 5400359
JobState=PENDING Reason=Dependency Dependency=afterok:5400358(unfulfilled)
job 5400360
JobState=PENDING Reason=Dependency Dependency=afterok:5400359(unfulfilled)
job 5400361
JobState=PENDING Reason=Dependency Dependency=afterok:5400357(unfulfilled),afterok:5400358(unfulfilled),afterok:5400359(unfulfilled),afterok:5400360(unfulfilled)
job 5400362
JobState=PENDING Reason=Dependency Dependency=afterok:5400357(unfulfilled),afterok:5400358(unfulfilled),afterok:5400359(unfulfilled),afterok:5400360(unfulfilled)
job 5400363
JobState=PENDING Reason=Dependency Dependency=afterok:5400357(unfulfilled),afterok:5400358(unfulfilled),afterok:5400359(unfulfilled),afterok:5400360(unfulfilled)
job 5400364
JobState=PENDING Reason=Dependency Dependency=afterok:5400357(unfulfilled),afterok:5400358(unfulfilled),afterok:5400359(unfulfilled),afterok:5400360(unfulfilled)
job 5400365
JobState=PENDING Reason=Dependency Dependency=afterok:5400361(unfulfilled),afterok:5400362(unfulfilled),afterok:5400363(unfulfilled),afterok:5400364(unfulfilled)
el2:collect>
After all jobs run we have:
el2:collect> ls -d ser*
ser1 ser2 ser3 ser4 ser5
el2:collect> ls ser*
ser1:
5400356.out
ser2:
5400356.out 5400357.out
ser3:
5400356.out 5400357.out 5400358.out
ser4:
5400356.out 5400357.out 5400358.out 5400359.out
ser5:
5400356.out 5400357.out 5400358.out 5400359.out 5400360.out
el2:collect>
el2:collect>
el2:collect> ls -d par*
par1 par2 par3 par4
el2:collect> ls par*
par1:
5400356.out 5400357.out 5400358.out 5400359.out 5400360.out 5400361.out
par2:
5400356.out 5400357.out 5400358.out 5400359.out 5400360.out 5400362.out
par3:
5400356.out 5400357.out 5400358.out 5400359.out 5400360.out 5400363.out
par4:
5400356.out 5400357.out 5400358.out 5400359.out 5400360.out 5400364.out
el2:collect>
el2:collect>
el2:collect> ls -d final
final
el2:collect> ls final
5400361.out 5400362.out 5400363.out 5400364.out 5400365.out
el2:collect>
++++