Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Kubernetes to v1.31.3 #2330

Merged
merged 3 commits into from
Dec 9, 2024

Conversation

astefanutti
Copy link
Contributor

What this PR does / why we need it:

Upgrade Kubernetes to v1.31.3.

Fixes #2291

Checklist:

  • Docs included if any changes are user facing

@kannon92
Copy link
Contributor

@tenzen-y did mention to me that we should do an upgrade from 1.29 to 1.30 first. And then we can do 1.30 to 1.31.

@tenzen-y
Copy link
Member

@tenzen-y did mention to me that we should do an upgrade from 1.29 to 1.30 first. And then we can do 1.30 to 1.31.

Yes, that's right. @astefanutti Could you cooperate with @kannon92 in #2299 before we move this forward?

@astefanutti
Copy link
Contributor Author

@kannon92 @tenzen-y sounds good, let me break this down then.

@astefanutti astefanutti mentioned this pull request Nov 22, 2024
1 task
@astefanutti
Copy link
Contributor Author

I've created #2332 that covers the upgrade to 1.30. I'll rebase this once #2332 is merged.

@astefanutti astefanutti marked this pull request as ready for review November 27, 2024 17:03
@astefanutti astefanutti force-pushed the pr-k8s-1.31 branch 2 times, most recently from 97bb453 to 037cba6 Compare November 27, 2024 17:38
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this!
@kubeflow/wg-training-leads @Electronic-Waste Please take a look.

@@ -18,7 +18,7 @@ jobs:
fail-fast: false
matrix:
# Detail: `setup-envtest list`
kubernetes-version: ["1.28.3", "1.29.3", "1.30.0"]
kubernetes-version: ["1.28.3", "1.29.3", "1.30.0", "1.31.3"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubeflow/wg-training-leads @astefanutti Are we ready to support 4 K8s version in our CI/CD ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.28 is EOL so it's probably OK to remove it: https://kubernetes.io/releases/.

Copy link
Member

@tenzen-y tenzen-y Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should support 1.28 - 1.32 in the next training-operator release in this Dec.
Because the 1.28 is still actively supported in most popular cloud providers and the next release is final one for the v1 API.

But, after we release the 1.9.0 in Dec, I think that we should immediately remove the 1.28 related codes in the master brach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Don't we hit any limits in GitHub actions if we run our tests on 4 k8s version ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich AFAIK, there are no limits for open repos.

@@ -29,7 +30,7 @@ import (
"github.com/kubeflow/training-operator/pkg/runtime.v2/framework/plugins/torch"
)

type Registry map[string]func(ctx context.Context, client client.Client, indexer client.FieldIndexer) (framework.Plugin, error)
type Registry map[string]func(ctx context.Context, client client.Client, cache cache.Cache, indexer client.FieldIndexer) (framework.Plugin, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass cache everywhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is needed here: https://github.com/kubeflow/training-operator/pull/2330/files#diff-2678f0825d50893b1cc0a6e340510d5530435e83b94253fa220c585cee92f0c8R284-R296 so it's possible to make PodGroupLimitRangeHandler and PodGroupRuntimeClassHandler strictly typed.

Unfortunately passing the manager cache from the main down to the ReconcilerBuilder instances forces to have it passed along all the initialization chain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astefanutti @tenzen-y Is there a way for us to initialize these watchers without bypassing the cache everywhere ?
E.g. If we add the cache parameter to the runtime registry, we force all of the plugins to implement this argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich @tenzen-y I've looked at it again and find a more elegant approach by passing the manager cache via the ReconcilerBuilder API rather than the Registry API. I think that looks much better now :)

@andreyvelich
Copy link
Member

@tenzen-y @astefanutti Just a quick question, why did we remove the helper function from our update-codegen script: https://github.com/kubeflow/training-operator/pull/2332/files#diff-149dfe7bb29d1191dceae3a52915e750e64b7f87257a5fb309c29d3056e2a95dL32-L43 ?

If we don't have it, developers have to understand that they need to run go mod download before running the code-gen.

@astefanutti
Copy link
Contributor Author

@tenzen-y @astefanutti Just a quick question, why did we remove the helper function from our update-codegen script: https://github.com/kubeflow/training-operator/pull/2332/files#diff-149dfe7bb29d1191dceae3a52915e750e64b7f87257a5fb309c29d3056e2a95dL32-L43 ?

If we don't have it, developers have to understand that they need to run go mod download before running the code-gen.

@andreyvelich you're right. From what I've seen, this is usually rather done in the Makefile. I've created #2339 based on that, but let me know if you'd rather stick to keeping the GET_PKG_LOCATION function.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 12137029594

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 9 of 9 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 12071681323: 0.0%
Covered Lines: 85
Relevant Lines: 85

💛 - Coveralls

@andreyvelich
Copy link
Member

Thank you @astefanutti, I think we can merge it.
/lgtm
/assign @tenzen-y

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response.
Thank you for this effort!
/lgtm
/approve
/hold cancel

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 374e88a into kubeflow:master Dec 9, 2024
43 checks passed
@astefanutti astefanutti deleted the pr-k8s-1.31 branch December 9, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Kubernetes v1.29 - v1.31 or v1.28 - v1.31
6 participants