[Release] Training Operator 1.8 Roadmap #1994

andreyvelich · 2024-01-24T21:29:08Z

andreyvelich · 2024-01-24T21:30:03Z

@johnugeorge @deepanker13 Do we need to create tracking issue with remaining items for Train/Fine-tune API for LLMs ?

terrytangyuan · 2024-01-25T00:44:49Z

I'd like to get #1953 merged as well. I think the risk is pretty low.

StefanoFioravanzo · 2024-01-25T08:09:57Z

@andreyvelich thanks for putting this together. On the "Misc: Improve docs for the training operator", if you can start a seprate issue highligintg known issues, doc areas to be improved or particular topics you want to address we can start coordinating with the release team doc leads as well to get some help.

I would suggest having a separate issue for autogen APIs, in case you want to address that as well.

andreyvelich · 2024-01-25T16:16:58Z

@terrytangyuan Sure, can we discuss the MXJob deprecation plan on the next AutoML and Training WG meeting ?
I think, it would be better if we are going to remove support for MXJob in 2 releases. For example, in 1.8 release we are going to inform users that MXJob will be removed in the next version. And when we release 1.9 we will remove MXJob.
That should give sufficient time for users to migrate even that MXNet has already been archived.
WDYT @kubeflow/wg-training-leads @tenzen-y ?

if you can start a seprate issue highligintg known issues

@StefanoFioravanzo Sure, I will create an issue based on tasks that we discuss on the last call.
Also, I will create issue for SDK doc autogen.

tenzen-y · 2024-01-25T16:50:48Z

First of all, as I mentioned here: kubeflow/katib#2255 (comment), I would suggest supporting kubernetes v1.27-v1.29.

Also, Moving #1906 forward would be better. It probably isn't possible to complete all the tasks, but I think we will be able to get some results.

tenzen-y · 2024-01-25T16:54:05Z

I think, it would be better if we are going to remove support for MXJob in 2 releases. For example, in 1.8 release we are going to inform users that MXJob will be removed in the next version. And when we release 1.9 we will remove MXJob.
That should give sufficient time for users to migrate even that MXNet has already been archived.
WDYT @kubeflow/wg-training-leads @tenzen-y ?

SGTM. We can say that we don't any maintenance for MXJob during one release, which means it was deprecated.
Creating a dedicated issue would be better.

terrytangyuan · 2024-01-25T17:10:54Z

@andreyvelich Sounds good

andreyvelich · 2024-01-25T20:20:17Z

First of all, as I mentioned here: kubeflow/katib#2255 (comment), I would suggest supporting kubernetes v1.27-v1.29.

It's good point about Kubernetes version @tenzen-y!
I agree that 1.27 - 1.29 should be our target.
@kubeflow/release-team What do you think about target goal of supporting Kubernetes 1.27 - 1.29 for Kubeflow 1.9 release?

tenzen-y · 2024-01-29T20:52:55Z

Ah, I found the features that we drop from the previous release due to the release deadline.

Can we put the following to improve UX:

andreyvelich · 2024-01-29T21:23:49Z

I just had discussion with @kubeflow/release-managers on Kubernetes versions.
We are going to target Kubernetes 1.27 - 1.29 for the next release of Training Operator.

tenzen-y · 2024-01-29T21:45:29Z

I just had discussion with @kubeflow/release-managers on Kubernetes versions. We are going to target Kubernetes 1.27 - 1.29 for the next release of Training Operator.

It's nice notifications! Thank you!

deepanker13 · 2024-02-05T08:11:00Z

@johnugeorge @deepanker13 Do we need to create tracking issue with remaining items for Train/Fine-tune API for LLMs ?

Okay I will create one

StefanoFioravanzo · 2024-02-28T14:30:08Z

Hello @kubeflow/wg-training-leads, this is a kind reminder that Monday, March 4th will be our Kubeflow 1.9 release development checkpoint, we will be halfway through our dev cycle, and we expect most of the work to be well underway (reminder: code freeze is scheduled for Apr 15th)

Can you please acknowledge your status with respect to your roadmap, comment on the progress made so far, and provide an assessment of the work that remains?

(understandably) Not everything may be completed in time. Please proactively let the release team know if there are delays, blockers, or uncertain situations, know so that we can align expectations and try and help you out, if possible.

satishpasumarthi · 2024-04-22T17:52:51Z

Hi ! When is the v1.8 is planned for release? Some managed k8s versions e.g EKS reach end of support very soon. (July 24, 2024) https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar
So this release very important k8s who plan to migrate. Is there any tentative timeline ? Please advise.
@StefanoFioravanzo @andreyvelich

andreyvelich · 2024-04-22T22:25:33Z

Hi @satishpasumarthi, we are planing to make the first RC.0 for Training Operator v1.8 this week.
We will support Kubernetes v1.27-1.29 in that release.

satishpasumarthi · 2024-04-23T03:46:25Z

Hi @satishpasumarthi, we are planing to make the first RC.0 for Training Operator v1.8 this week. We will support Kubernetes v1.27-1.29 in that release.

Thanks for the reply @andreyvelich . I see only PRs for supporting v1.28 and v1.29 #2039 and #2038. My understanding was v1.27 is already supported in v1.7. Please correct me if I am mistaken

tenzen-y · 2024-04-23T04:33:11Z

Hi @satishpasumarthi, we are planing to make the first RC.0 for Training Operator v1.8 this week. We will support Kubernetes v1.27-1.29 in that release.

Thanks for the reply @andreyvelich . I see only PRs for supporting v1.28 and v1.29 #2039 and #2038. My understanding was v1.27 is already supported in v1.7. Please correct me if I am mistaken

@satishpasumarthi You're correct.
In v1.7, the training-operator supports v1.25-v1.27. In v1.8, the training-operator will support v1.27-v1.29.

rimolive · 2024-04-26T15:57:43Z

Is there anything missing to cut the release? We want to start the manifests sync for training-operator for Kubeflow 1.9.0-rc0

tenzen-y · 2024-04-26T17:22:15Z

Is there anything missing to cut the release? We want to start the manifests sync for training-operator for Kubeflow 1.9.0-rc0

Not yet. Johnu will prepare the release today.

philkuz · 2024-06-14T23:25:48Z

Any updates on when we might see a new release?

tenzen-y · 2024-06-15T09:29:43Z

Any updates on when we might see a new release?

You can find the new release here: https://github.com/kubeflow/training-operator/releases/tag/v1.8.0-rc.0

andreyvelich · 2024-07-25T17:03:17Z

Training Operator 1.8 has been released 🎉
https://github.com/kubeflow/training-operator/releases/tag/v1.8.0

Thanks everyone for your contributions!

andreyvelich added the release/1.8 label Jan 24, 2024

andreyvelich added this to the v0.8.0 Release milestone Jan 24, 2024

andreyvelich mentioned this issue Jan 24, 2024

Training WG roadmap for KF 1.9 kubeflow/manifests#2597

Closed

andreyvelich pinned this issue Jan 29, 2024

StefanoFioravanzo mentioned this issue Mar 6, 2024

Remaining items for Train/Fine-tune sdk #2003

Closed

4 tasks

andreyvelich mentioned this issue Apr 22, 2024

[Question] Training Operator v1.8 Release Date #2078

Closed

andreyvelich closed this as completed Jul 25, 2024

andreyvelich unpinned this issue Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Release] Training Operator 1.8 Roadmap #1994

[Release] Training Operator 1.8 Roadmap #1994

andreyvelich commented Jan 24, 2024 •

edited

Loading

andreyvelich commented Jan 24, 2024

terrytangyuan commented Jan 25, 2024 •

edited

Loading

StefanoFioravanzo commented Jan 25, 2024

andreyvelich commented Jan 25, 2024

tenzen-y commented Jan 25, 2024

tenzen-y commented Jan 25, 2024 •

edited

Loading

terrytangyuan commented Jan 25, 2024

andreyvelich commented Jan 25, 2024

tenzen-y commented Jan 29, 2024

andreyvelich commented Jan 29, 2024

tenzen-y commented Jan 29, 2024

deepanker13 commented Feb 5, 2024

StefanoFioravanzo commented Feb 28, 2024 •

edited

Loading

satishpasumarthi commented Apr 22, 2024 •

edited

Loading

andreyvelich commented Apr 22, 2024

satishpasumarthi commented Apr 23, 2024

tenzen-y commented Apr 23, 2024

rimolive commented Apr 26, 2024

tenzen-y commented Apr 26, 2024

philkuz commented Jun 14, 2024 •

edited

Loading

tenzen-y commented Jun 15, 2024

andreyvelich commented Jul 25, 2024

[Release] Training Operator 1.8 Roadmap #1994

[Release] Training Operator 1.8 Roadmap #1994

Comments

andreyvelich commented Jan 24, 2024 • edited Loading

SDK

Backend

Misc

andreyvelich commented Jan 24, 2024

terrytangyuan commented Jan 25, 2024 • edited Loading

StefanoFioravanzo commented Jan 25, 2024

andreyvelich commented Jan 25, 2024

tenzen-y commented Jan 25, 2024

tenzen-y commented Jan 25, 2024 • edited Loading

terrytangyuan commented Jan 25, 2024

andreyvelich commented Jan 25, 2024

tenzen-y commented Jan 29, 2024

andreyvelich commented Jan 29, 2024

tenzen-y commented Jan 29, 2024

deepanker13 commented Feb 5, 2024

StefanoFioravanzo commented Feb 28, 2024 • edited Loading

satishpasumarthi commented Apr 22, 2024 • edited Loading

andreyvelich commented Apr 22, 2024

satishpasumarthi commented Apr 23, 2024

tenzen-y commented Apr 23, 2024

rimolive commented Apr 26, 2024

tenzen-y commented Apr 26, 2024

philkuz commented Jun 14, 2024 • edited Loading

tenzen-y commented Jun 15, 2024

andreyvelich commented Jul 25, 2024

andreyvelich commented Jan 24, 2024 •

edited

Loading

terrytangyuan commented Jan 25, 2024 •

edited

Loading

tenzen-y commented Jan 25, 2024 •

edited

Loading

StefanoFioravanzo commented Feb 28, 2024 •

edited

Loading

satishpasumarthi commented Apr 22, 2024 •

edited

Loading

philkuz commented Jun 14, 2024 •

edited

Loading