Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename JobSet status to Complete #723

Open
2 of 3 tasks
andreyvelich opened this issue Dec 6, 2024 · 4 comments
Open
2 of 3 tasks

Rename JobSet status to Complete #723

andreyvelich opened this issue Dec 6, 2024 · 4 comments

Comments

@andreyvelich
Copy link
Contributor

andreyvelich commented Dec 6, 2024

What would you like to be added:

@tenzen-y and I noticed that JobSet uses the Completed status rather than Complete.

However, Kubernetes Job uses the Complete status.
We also use the Complete status in the TrainJob to be consistent with the Batch/Job.

Additionally, I would propose that we rename Succeeded ReplicatedJobStatus to Complete, since there is no such status as Succeeded on the Job level. E.g.

replicatedJobStatus:
 - name: Job-A
  complete: 1
  failed: 0
    ...

Any thoughts @tenzen-y @kannon92 @danielvegamyhre @ahg-g ?

Why is this needed:

Being consistent across various Kubernetes resources.

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@googs1025
Copy link
Member

googs1025 commented Dec 7, 2024

it makes sense to me, I am willing to make this change. 🤔

@kannon92
Copy link
Contributor

kannon92 commented Dec 7, 2024

I tend to prefer "Succeeded" as it matches terminal states like 'Suspended' and 'Failed'. I do think it unfortunate that Job uses complete.

I don't really have a strong feeling about the renaming but I think this should be considered a breaking change so we should probably introduce "Complete" and leave "Succeeded" as deprecated. And then maybe in a release or two we can drop "Succeeded".

But I don't really feel that this renaming is worth the effort.

@andreyvelich
Copy link
Contributor Author

I tend to prefer "Succeeded"

We discussed before with @tenzen-y, that it is tricky to define what does Succeeded mean for Job or JobSet given all of the Success and Failure policies that we have: kubeflow/training-operator#2298.

But I don't really feel that this renaming is worth the effort.

I feel like if we want to be consistent across all of our Batch Jobs CRDs, we should not introduce such terminology for the Job status.

Especially, if we want to match our Jobs counters with the actual status of the Job: https://github.com/kubernetes-sigs/jobset/blob/main/api/jobset/v1alpha2/jobset_types.go#L180.

@ahg-g
Copy link
Contributor

ahg-g commented Dec 12, 2024

This is practically part of the api now, so we need to bump the api version for this change, we could do this only when we graduate to v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants