-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mutating schedulingGates when the Jobset is suspended #623
Allow mutating schedulingGates when the Jobset is suspended #623
Conversation
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
// Pod Scheduling Gates can be updated for batch/v1 Job: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662 | ||
mungedSpec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates = oldJS.Spec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the batch/job, we can mutate the schedulingGates only when the Job didn't start: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/registry/batch/job/strategy.go#L194
So, shouldn't we introduce the same criteria for the mutable scheduling directive here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JobSet we just check if the Job is suspended. This is done in the entry if above if ptr.Deref(oldJS.Spec.Suspend, false) {
.
The other part of the check is based on the status.startTime
, but this field is not in the JobSet API:
jobset/api/jobset/v1alpha2/jobset_types.go
Lines 125 to 146 in 56c77da
type JobSetStatus struct { | |
// +optional | |
// +listType=map | |
// +listMapKey=type | |
Conditions []metav1.Condition `json:"conditions,omitempty"` | |
// Restarts tracks the number of times the JobSet has restarted (i.e. recreated in case of RecreateAll policy). | |
Restarts int32 `json:"restarts,omitempty"` | |
// RestartsCountTowardsMax tracks the number of times the JobSet has restarted that counts towards the maximum allowed number of restarts. | |
RestartsCountTowardsMax int32 `json:"restartsCountTowardsMax,omitempty"` | |
// TerminalState the state of the JobSet when it finishes execution. | |
// It can be either Complete or Failed. Otherwise, it is empty by default. | |
TerminalState string `json:"terminalState,omitempty"` | |
// ReplicatedJobsStatus track the number of JobsReady for each replicatedJob. | |
// +optional | |
// +listType=map | |
// +listMapKey=name | |
ReplicatedJobsStatus []ReplicatedJobStatus `json:"replicatedJobsStatus,omitempty"` | |
} |
The difference between Job and JobSet is related to all of the fields (Annotations, Labels, NodeSelector, Tolerations too),.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the check above only verifies that oldJS
is suspended. Which means that, IIUC, there is a bug that we cannot modify the template on suspending (oldJS
unsuspended, but js
suspended). I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other part of the check is based on the status.startTime, but this field is not in the JobSet API:
Oh, I didn't know that! Thank you for the explanation!
I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.
I also feel that the behavior is a bug/regression. But, I agree with treating the regression as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I have confirmed that this currently makes the suspend fail for JobSet in Kueue: kubernetes-sigs/kueue#2691. I have opened an issue for JobSet #624, and the PR: #625. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is rather straightforward (and follows the patterns of other fields) so I wasn't planning an e2e test here.
I think once this PR lands, we could have an extended e2e test scenario added for the other PR, where we:
- unsuspend a JobSet adding scheduling gates, and other PodTemplate fields
- suspend the JobSet reverting the scheduling gates and other PodTemplate fields (as Kueue would do)
- unsuspend again the JobSet with different values of the PodTemplate fields and check that the pods executes (JobSet completes)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit + integration tests should be sufficient here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimowo that sounds good. Maybe we create an issue documenting the goal of the e2e test so we can reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to just reference the proposal from the issue here: #624? I have added a comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit + integration tests should be sufficient here.
Ok, I have added the unit test dedicated for scheduling gates, while the integration test I extended the existing one for schedulingGates, because integration tests are heavier than unit, and the test case verifies that the mutation can happen, so any field that is immutable would fail the test (I confirmed it was failing before the change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I think that this would be useful feature!
// Pod Scheduling Gates can be updated for batch/v1 Job: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662 | ||
mungedSpec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates = oldJS.Spec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other part of the check is based on the status.startTime, but this field is not in the JobSet API:
Oh, I didn't know that! Thank you for the explanation!
I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.
I also feel that the behavior is a bug/regression. But, I agree with treating the regression as a separate issue.
a8c20dd
to
390fe1f
Compare
390fe1f
to
dd2dd59
Compare
/lgtm Will approve once test finish running and I confirm they all are passing. |
/approve Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
SchedulingGates can be mutated for suspeded batch Jobs, so we would like to have similar capabilities for JobSet from Kueue perspective: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662
We are considering to use schedulingGates in Kueue for fine-grained scheduling where a subset of pods may get an additional set of node labels, so we don't update the entire PodTemplate, but specific pods. In that case Kueue would have a controller which ungates the pods once adjusted.
This is an extension to #580.