-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Kubernetes to v1.31.3 #2330
Conversation
386154d
to
9e7671a
Compare
@tenzen-y did mention to me that we should do an upgrade from 1.29 to 1.30 first. And then we can do 1.30 to 1.31. |
Yes, that's right. @astefanutti Could you cooperate with @kannon92 in #2299 before we move this forward? |
9e7671a
to
89260c6
Compare
97bb453
to
037cba6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this!
@kubeflow/wg-training-leads @Electronic-Waste Please take a look.
.github/workflows/unittests.yaml
Outdated
@@ -18,7 +18,7 @@ jobs: | |||
fail-fast: false | |||
matrix: | |||
# Detail: `setup-envtest list` | |||
kubernetes-version: ["1.28.3", "1.29.3", "1.30.0"] | |||
kubernetes-version: ["1.28.3", "1.29.3", "1.30.0", "1.31.3"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubeflow/wg-training-leads @astefanutti Are we ready to support 4 K8s version in our CI/CD ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.28 is EOL so it's probably OK to remove it: https://kubernetes.io/releases/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should support 1.28 - 1.32 in the next training-operator release in this Dec.
Because the 1.28 is still actively supported in most popular cloud providers and the next release is final one for the v1 API.
But, after we release the 1.9.0 in Dec, I think that we should immediately remove the 1.28 related codes in the master brach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y Don't we hit any limits in GitHub actions if we run our tests on 4 k8s version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich AFAIK, there are no limits for open repos.
@@ -29,7 +30,7 @@ import ( | |||
"github.com/kubeflow/training-operator/pkg/runtime.v2/framework/plugins/torch" | |||
) | |||
|
|||
type Registry map[string]func(ctx context.Context, client client.Client, indexer client.FieldIndexer) (framework.Plugin, error) | |||
type Registry map[string]func(ctx context.Context, client client.Client, cache cache.Cache, indexer client.FieldIndexer) (framework.Plugin, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass cache everywhere ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is needed here: https://github.com/kubeflow/training-operator/pull/2330/files#diff-2678f0825d50893b1cc0a6e340510d5530435e83b94253fa220c585cee92f0c8R284-R296 so it's possible to make PodGroupLimitRangeHandler
and PodGroupRuntimeClassHandler
strictly typed.
Unfortunately passing the manager cache from the main down to the ReconcilerBuilder
instances forces to have it passed along all the initialization chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astefanutti @tenzen-y Is there a way for us to initialize these watchers without bypassing the cache everywhere ?
E.g. If we add the cache parameter to the runtime registry, we force all of the plugins to implement this argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @tenzen-y I've looked at it again and find a more elegant approach by passing the manager cache via the ReconcilerBuilder
API rather than the Registry
API. I think that looks much better now :)
037cba6
to
b673406
Compare
@tenzen-y @astefanutti Just a quick question, why did we remove the helper function from our update-codegen script: https://github.com/kubeflow/training-operator/pull/2332/files#diff-149dfe7bb29d1191dceae3a52915e750e64b7f87257a5fb309c29d3056e2a95dL32-L43 ? If we don't have it, developers have to understand that they need to run |
@andreyvelich you're right. From what I've seen, this is usually rather done in the Makefile. I've created #2339 based on that, but let me know if you'd rather stick to keeping the |
Signed-off-by: Antonin Stefanutti <[email protected]>
Signed-off-by: Antonin Stefanutti <[email protected]>
b673406
to
b32a059
Compare
Signed-off-by: Antonin Stefanutti <[email protected]>
Pull Request Test Coverage Report for Build 12137029594Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Thank you @astefanutti, I think we can merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response.
Thank you for this effort!
/lgtm
/approve
/hold cancel
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Upgrade Kubernetes to v1.31.3.
Fixes #2291
Checklist: