You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[corbett8@cluster1081:~]$ flux run -t15m -N1 -n1 -c24 -g 1 -o mpibind=on hostname
0.022s: job.exception type=alloc severity=0 Unsupported resource type 'gpu'
[corbett8@cluster1081:~]$ flux resource list
STATE PROPERTIES NNODES NCORES NGPUS NODELIST
free plarge,pdev,pall 2 192 8 cluster[1081,1084]
allocated 0 0 0
down 0 0 0
[corbett8@cluster1081:~]$ flux kvs get resource.R | jq .execution
{
"R_lite": [
{
"rank": "0-1",
"children": {
"gpu": "0-3",
"core": "0-95"
}
}
],
"starttime": 1729014140.0,
"expiration": 1729028540.0,
"nodelist": [
"cluster[1081,1084]"
],
"properties": {
"plarge": "0-1",
"pdev": "0-1",
"pall": "0-1"
}
}
[corbett8@cluster1081:~]$ flux module list
Module Idle S Sendq Recvq Service
content-sqlite idle R 0 0 content-backing
job-manager idle R 0 0
cron idle R 0 0
sched-simple idle R 0 0 feasibility,sched
resource idle R 0 0
job-info idle R 0 0
job-exec idle R 0 0
heartbeat 0 R 0 0
barrier idle R 0 0
job-ingest idle R 0 0
job-list idle R 0 0
connector-local 0 R 0 0
kvs 22 R 0 0
content 22 R 0 0
kvs-watch idle R 0 0
The instance is actually configured to use Fluxion with JGF but due to flux-framework/flux-sched#1310 sched-simple is loaded instead. But FWIW the JGF does contain GPU vertices.
The text was updated successfully, but these errors were encountered:
sched-simple doesn't currently support scheduling GPUs, so it raises an exception on jobs that ask for them.
Overall, flux-core doesn't support JGF, that's a Fluxion-only thing.
So the root issue here is that sched-simple was loaded instead of Fluxion. This occurs when there is an error loading the Fluxion modules, because unfortunately the current way rc1 works is to load sched-simple if there is no other scheduler loaded after running all rc1.d/* files. (There's already an issue and a plan for improving this)
So I think this a flux-sched issue, not a core issue.
The instance is actually configured to use Fluxion with JGF but due to flux-framework/flux-sched#1310 sched-simple is loaded instead. But FWIW the JGF does contain GPU vertices.
The text was updated successfully, but these errors were encountered: