You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is cross-referenced from kubernetes-sigs/kueue#3352. The intention is to clarify integration related questions from the notebook controller side.
Brief Overview:
Notebooks, especially GPU-enabled ones, can demand substantial resources, similar to other ML batch workloads. Managing them through Kueue allows users to schedule Notebooks more efficiently within cluster resources. With this feature, scheduling of NB resources based on cluster quota will be handled by Kueue, whereas the lifecycle of the NB resource itself will remain the responsibility of the NB controller. Ideally, there should not be any changes in the current features or responsibilities of the NB controller.
Additional References:
As notebook v1beta APIs currently use StatefulSets underneath to manage pods, it would be easier to use Kueue with StatefulSet+Pod integration enabled to be able to manage NB workloads.
Documentation on Kueue integrations: https://kueue.sigs.k8s.io/docs/tasks/run/statefulset/
Enable scaling of pods belonging to SS: kubernetes-sigs/kueue#3487
Note: The intention is to make Kueue compatible with both v1 and v2 APIs.
On testing Kueue with the above integrations enabled, NB pods are able to be queued in Local Queue based on resource quota.
Open Question:
In case that a Notebook is preempted by Kueue, should the notebook-controller be modified to add a finalizer to perform backups? Is it the responsibility of the NB controller to handle backups? Preemption in general need not necessarily be by Kueue, it could also be that the underlying pod is preempted by the Kube scheduler. Or is it reasonable to assume that NBs would always use persistent volumes to store data?
The text was updated successfully, but these errors were encountered:
This issue is cross-referenced from kubernetes-sigs/kueue#3352. The intention is to clarify integration related questions from the notebook controller side.
Brief Overview:
Notebooks, especially GPU-enabled ones, can demand substantial resources, similar to other ML batch workloads. Managing them through Kueue allows users to schedule Notebooks more efficiently within cluster resources. With this feature, scheduling of NB resources based on cluster quota will be handled by Kueue, whereas the lifecycle of the NB resource itself will remain the responsibility of the NB controller. Ideally, there should not be any changes in the current features or responsibilities of the NB controller.
Additional References:
As notebook v1beta APIs currently use StatefulSets underneath to manage pods, it would be easier to use Kueue with StatefulSet+Pod integration enabled to be able to manage NB workloads.
Documentation on Kueue integrations: https://kueue.sigs.k8s.io/docs/tasks/run/statefulset/
Enable scaling of pods belonging to SS: kubernetes-sigs/kueue#3487
Note: The intention is to make Kueue compatible with both v1 and v2 APIs.
On testing Kueue with the above integrations enabled, NB pods are able to be queued in Local Queue based on resource quota.
Open Question:
The text was updated successfully, but these errors were encountered: