Upgrade Kubernetes to v1.31.3 #2330

astefanutti · 2024-11-21T18:54:15Z

What this PR does / why we need it:

Upgrade Kubernetes to v1.31.3.

Fixes #2291

Checklist:

Docs included if any changes are user facing

kannon92 · 2024-11-21T21:38:21Z

@tenzen-y did mention to me that we should do an upgrade from 1.29 to 1.30 first. And then we can do 1.30 to 1.31.

tenzen-y · 2024-11-22T06:12:20Z

@tenzen-y did mention to me that we should do an upgrade from 1.29 to 1.30 first. And then we can do 1.30 to 1.31.

Yes, that's right. @astefanutti Could you cooperate with @kannon92 in #2299 before we move this forward?

astefanutti · 2024-11-22T07:56:56Z

@kannon92 @tenzen-y sounds good, let me break this down then.

astefanutti · 2024-11-22T12:51:43Z

I've created #2332 that covers the upgrade to 1.30. I'll rebase this once #2332 is merged.

andreyvelich

Thank you for doing this!
@kubeflow/wg-training-leads @Electronic-Waste Please take a look.

andreyvelich · 2024-11-27T18:51:46Z

.github/workflows/unittests.yaml

@@ -18,7 +18,7 @@ jobs:
      fail-fast: false
      matrix:
        # Detail: `setup-envtest list`
-        kubernetes-version: ["1.28.3", "1.29.3", "1.30.0"]
+        kubernetes-version: ["1.28.3", "1.29.3", "1.30.0", "1.31.3"]


@kubeflow/wg-training-leads @astefanutti Are we ready to support 4 K8s version in our CI/CD ?

1.28 is EOL so it's probably OK to remove it: https://kubernetes.io/releases/.

I'm wondering if we should support 1.28 - 1.32 in the next training-operator release in this Dec.
Because the 1.28 is still actively supported in most popular cloud providers and the next release is final one for the v1 API.

But, after we release the 1.9.0 in Dec, I think that we should immediately remove the 1.28 related codes in the master brach.

@tenzen-y Don't we hit any limits in GitHub actions if we run our tests on 4 k8s version ?

@andreyvelich AFAIK, there are no limits for open repos.

andreyvelich · 2024-11-27T19:01:37Z

pkg/runtime.v2/framework/plugins/registry.go

@@ -29,7 +30,7 @@ import (
 	"github.com/kubeflow/training-operator/pkg/runtime.v2/framework/plugins/torch"
 )

-type Registry map[string]func(ctx context.Context, client client.Client, indexer client.FieldIndexer) (framework.Plugin, error)
+type Registry map[string]func(ctx context.Context, client client.Client, cache cache.Cache, indexer client.FieldIndexer) (framework.Plugin, error)


Why do we need to pass cache everywhere ?

The cache is needed here: https://github.com/kubeflow/training-operator/pull/2330/files#diff-2678f0825d50893b1cc0a6e340510d5530435e83b94253fa220c585cee92f0c8R284-R296 so it's possible to make PodGroupLimitRangeHandler and PodGroupRuntimeClassHandler strictly typed.

Unfortunately passing the manager cache from the main down to the ReconcilerBuilder instances forces to have it passed along all the initialization chain.

@astefanutti @tenzen-y Is there a way for us to initialize these watchers without bypassing the cache everywhere ?
E.g. If we add the cache parameter to the runtime registry, we force all of the plugins to implement this argument.

@andreyvelich @tenzen-y I've looked at it again and find a more elegant approach by passing the manager cache via the ReconcilerBuilder API rather than the Registry API. I think that looks much better now :)

andreyvelich · 2024-11-28T22:50:17Z

@tenzen-y @astefanutti Just a quick question, why did we remove the helper function from our update-codegen script: https://github.com/kubeflow/training-operator/pull/2332/files#diff-149dfe7bb29d1191dceae3a52915e750e64b7f87257a5fb309c29d3056e2a95dL32-L43 ?

If we don't have it, developers have to understand that they need to run go mod download before running the code-gen.

astefanutti · 2024-11-29T09:20:19Z

@tenzen-y @astefanutti Just a quick question, why did we remove the helper function from our update-codegen script: https://github.com/kubeflow/training-operator/pull/2332/files#diff-149dfe7bb29d1191dceae3a52915e750e64b7f87257a5fb309c29d3056e2a95dL32-L43 ?

If we don't have it, developers have to understand that they need to run go mod download before running the code-gen.

@andreyvelich you're right. From what I've seen, this is usually rather done in the Makefile. I've created #2339 based on that, but let me know if you'd rather stick to keeping the GET_PKG_LOCATION function.

Signed-off-by: Antonin Stefanutti <[email protected]>

coveralls · 2024-12-03T09:52:32Z

Pull Request Test Coverage Report for Build 12137029594

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

9 of 9 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 12071681323:	0.0%
Covered Lines:	85
Relevant Lines:	85

💛 - Coveralls

andreyvelich · 2024-12-04T18:56:35Z

Thank you @astefanutti, I think we can merge it.
/lgtm
/assign @tenzen-y

tenzen-y

Sorry for the late response.
Thank you for this effort!
/lgtm
/approve
/hold cancel

google-oss-prow · 2024-12-09T18:16:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 21, 2024

google-oss-prow bot requested review from jinchihe and kuizhiqing November 21, 2024 18:54

google-oss-prow bot added the size/XXL label Nov 21, 2024

astefanutti force-pushed the pr-k8s-1.31 branch from 386154d to 9e7671a Compare November 21, 2024 18:55

astefanutti mentioned this pull request Nov 22, 2024

Upgrade Kubernetes to v1.30.7 #2332

Merged

1 task

astefanutti force-pushed the pr-k8s-1.31 branch from 9e7671a to 89260c6 Compare November 27, 2024 17:03

astefanutti marked this pull request as ready for review November 27, 2024 17:03

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 27, 2024

astefanutti force-pushed the pr-k8s-1.31 branch 2 times, most recently from 97bb453 to 037cba6 Compare November 27, 2024 17:38

andreyvelich reviewed Nov 27, 2024

View reviewed changes

astefanutti force-pushed the pr-k8s-1.31 branch from 037cba6 to b673406 Compare November 28, 2024 12:10

astefanutti added 2 commits November 29, 2024 16:19

Upgrade Kubernetes to v1.31.3

7cc5d04

Signed-off-by: Antonin Stefanutti <[email protected]>

Skip controller names uniqueness check for tests

b32a059

Signed-off-by: Antonin Stefanutti <[email protected]>

astefanutti force-pushed the pr-k8s-1.31 branch from b673406 to b32a059 Compare November 29, 2024 15:21

andreyvelich mentioned this pull request Dec 2, 2024

Support Kubernetes v1.28 - v1.31 kubeflow/katib#2457

Open

Move cache argument from Registry to ReconcilerBuilder API

69cc00b

Signed-off-by: Antonin Stefanutti <[email protected]>

google-oss-prow bot assigned tenzen-y and andreyvelich Dec 4, 2024

google-oss-prow bot added the lgtm label Dec 4, 2024

tenzen-y reviewed Dec 9, 2024

View reviewed changes

google-oss-prow bot added the approved label Dec 9, 2024

google-oss-prow bot merged commit 374e88a into kubeflow:master Dec 9, 2024
43 checks passed

astefanutti deleted the pr-k8s-1.31 branch December 9, 2024 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Kubernetes to v1.31.3 #2330

Upgrade Kubernetes to v1.31.3 #2330

astefanutti commented Nov 21, 2024

kannon92 commented Nov 21, 2024

tenzen-y commented Nov 22, 2024

astefanutti commented Nov 22, 2024

astefanutti commented Nov 22, 2024

andreyvelich left a comment

andreyvelich Nov 27, 2024

astefanutti Nov 28, 2024

tenzen-y Nov 28, 2024 •

edited

Loading

andreyvelich Nov 29, 2024

Electronic-Waste Dec 1, 2024

andreyvelich Nov 27, 2024

astefanutti Nov 28, 2024

andreyvelich Dec 2, 2024

astefanutti Dec 3, 2024

andreyvelich commented Nov 28, 2024

astefanutti commented Nov 29, 2024

coveralls commented Dec 3, 2024

andreyvelich commented Dec 4, 2024

tenzen-y left a comment

google-oss-prow bot commented Dec 9, 2024

Upgrade Kubernetes to v1.31.3 #2330

Upgrade Kubernetes to v1.31.3 #2330

Conversation

astefanutti commented Nov 21, 2024

kannon92 commented Nov 21, 2024

tenzen-y commented Nov 22, 2024

astefanutti commented Nov 22, 2024

astefanutti commented Nov 22, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenzen-y Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreyvelich commented Nov 28, 2024

astefanutti commented Nov 29, 2024

coveralls commented Dec 3, 2024

Pull Request Test Coverage Report for Build 12137029594

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

andreyvelich commented Dec 4, 2024

tenzen-y left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Dec 9, 2024

tenzen-y Nov 28, 2024 •

edited

Loading