-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu config api #684
base: main
Are you sure you want to change the base?
Gpu config api #684
Conversation
…nto gpu-config-api
…nto gpu-config-api
…nto gpu-config-api # Conflicts: # deployment/helm/skaha/Chart.yaml # deployment/helm/skaha/templates/_helpers.tpl
…nto gpu-config-api # Conflicts: # deployment/helm/skaha/Chart.yaml # deployment/helm/skaha/templates/_helpers.tpl
|
||
try { | ||
final int majorNVIDIACUDAVersion = CommandExecutioner.getMajorNvidiaCudaGPUVersion(); | ||
jobLaunchString = setConfigValue(jobLaunchString, SOFTWARE_GPU_NVIDIA_CUDA_MAJOR_VERSION, Integer.toString(majorNVIDIACUDAVersion)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users already have access to the GPU version through their software, but this may be useful, not sure.
The general idea is to allow users to land on the right GPU (brand, version, gpu-core count). I think the ideas from ExecutionBroker are useful here and will help us align with that potential integration.
We currently expose at /context the content of k8s-resources. So this is just a static config reflecting the underlying capabilities of the cluster. Ideally, those values should come from the cluster instead. However, that is probably beyond the scope of this story. Also beyond the scope is adding the 'brokering' part of client interaction.
So I think, for now at least, the story is to simply let users specify, through API params, those 3 gpu conditions. I haven't gone through this whole PR yet but I'm guessing that a lot of that is already there. Let's chat about it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. According to CADC-13476, we wanted the Major CUDA version supplied. This way scripts can look it up.
Also, there should be two (2) parameters specified; the gpu-type
and the gpus
(count) parameter.
…nto gpu-config-api
…latform into gpu-config-api
…nto gpu-config-api
…nto gpu-config-api # Conflicts: # deployment/helm/skaha/Chart.yaml # deployment/helm/skaha/skaha-config/launch-desktop.yaml # deployment/helm/skaha/values.yaml # skaha/VERSION # skaha/src/intTest/java/org/opencadc/skaha/DesktopAppLifecycleTest.java # skaha/src/intTest/java/org/opencadc/skaha/ExpiryTimeRenewalTest.java # skaha/src/intTest/java/org/opencadc/skaha/ImagesTest.java # skaha/src/intTest/java/org/opencadc/skaha/SessionLifecycleTest.java # skaha/src/intTest/java/org/opencadc/skaha/SessionUtil.java # skaha/src/main/java/org/opencadc/skaha/session/PostAction.java # skaha/src/main/java/org/opencadc/skaha/session/SessionAction.java
gpu-count:<gpu-vendor>
NVIDIA_CUDA_MAJOR_VERSION
environment variable in User Sessions from querying KubernetesSKAHA_SERVICE_ID
environment variable locally for integration tests to run