Replies: 4 comments 12 replies
-
Just to clarify, what we want to test here is to determine if on a large, fully loaded system with interactive users, all typical interactive commands (such as An idea is to run a real or simulated "large" instance (O(16K nodes)), start a job workload with some target throughput (e.g. it might be interesting to see the difference between a system running 1 job/s vs 10 job/s vs 50 job/s), and then have a script or set of scripts (perhaps launched on different nodes as suggested by @wihobbs) that issue interactive commands mentioned above. We'd then need some way to capture the "interactivity" performance of the "workload". E.g. perhaps the timing for each type of command could be captured, and the max, min, mean and stddev reported. This I think would give us some good insight as to how a large, busy system would respond to lots of users. |
Beta Was this translation helpful? Give feedback.
-
As a very simple example of testing with a simulated instance, this script was used in the scale testing of #!/bin/bash
PROG=$(basename $0)
VERBOSE=0
RPC=${1-resource.sched-status}
log() {
test $VERBOSE -eq 0 && return
local fmt=$1;
shift
printf >&2 "$PROG: $fmt\n" $@
}
rpc() { flux python -c "import flux, json; print(json.dumps(flux.Flux().rpc(\"$1\").get()))"; }
runtest() {
SCHEDULER=$1
NNODES=$2
log "Starting test of ${SCHEDULER} with ${NNODES} nodes"
log "Removing modules..."
flux module remove -f sched-fluxion-qmanager
flux module remove -f sched-fluxion-resource
flux module remove -f sched-simple
flux module remove resource
log "Loading fake resources via config..."
flux config load <<EOF
[resource]
noverify = true
norestrict = true
[[resource.config]]
hosts = "test[1-${NNODES}]"
cores = "0-63"
gpus = "0-8"
EOF
log "Reloading resource module..."
flux module load resource noverify monitor-force-up
log "Loading ${SCHEDULER} modules..."
if test "$SCHEDULER" = "sched-simple"; then
flux module load sched-simple
else
flux module load sched-fluxion-resource
flux module load sched-fluxion-qmanager
fi
log "Starting some active jobs..."
flux submit --quiet -xN1 --cc=1-${NNODES} \
--setattr=exec.test.run_duration=\"600\" --wait-event=start \
hostname
if test "$SCHEDULER" = "fluxion"; then
# allow fluxion to initialize graph?
rpc $RPC >/dev/null
rpc $RPC >/dev/null
fi
log "Timing resource.sched-status"
t0=$(date +%s.%N)
rpc $RPC >/dev/null
t1=$(date +%s.%N)
dt1=$(echo "$t1 - $t0" | bc -l)
log "Timing flux resource list"
t0=$(date +%s.%N)
flux resource list >/dev/null
t1=$(date +%s.%N)
dt2=$(echo "$t1 - $t0" | bc -l)
printf "%-13s %8s %24.3f %22.3f\n" $SCHEDULER $NNODES $dt1 $dt2
flux cancel --all --quiet 2>/dev/null
flux queue idle --quiet
flux module unload -f sched-fluxion-qmanager
flux module unload -f sched-fluxion-resource
}
printf "%-13s %8s %18s %22s\n" \
SCHEDULER NNODES "T($RPC)" "T(flux resource list)"
for scheduler in sched-simple fluxion; do
for nnodes in 128 256 512 1024 2048 4096 8192 16384; do
runtest $scheduler $nnodes
done
done
# vi: ts=4 sw=4 expandtab |
Beta Was this translation helpful? Give feedback.
-
This weekend I took a node of
Some (potential) confounding variables here:
One thought I had was to make this test part of flux-test-collective, maybe on a less frequent (weekly? monthly?) basis. We could compare future numbers to these baselines, and fail the test if the interactive command mean increases by a set percentage. |
Beta Was this translation helpful? Give feedback.
-
This is a visualization of the heap generated with massif-visualizer. At the peak snapshot (no. 42), the heap was 4.5GiB. I periodically checked the memory utilization with I want to spend some time cleaning up the way this test is run, continuing the work @grondo suggested above and splitting RPC and command timing, and doing more appropriate logging instead of checking binary versions after the fact. But since it took some work just to get to this first-pass chart, I thought I'd put it up for public inspection. |
Beta Was this translation helpful? Give feedback.
-
We recently had the opportunity to do scale testing on the Dane cluster, and in a similar vein, it might be a good idea us to do some scale testing with synthetic workloads to check throughput. This was brought up on the coffee call today.
Also, some recent issues such as the >30s hang on
flux resource list
#5819 show that some user-level commands might take a while on a system instance with thousands of nodes and jobs running all at once, so it'd be good to check other user commands too, such asflux jobs
andflux top
.This discussion is open so we can come up with a design plan for the two above test cases.
Beta Was this translation helpful? Give feedback.
All reactions