-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release] Training Operator 1.8 Roadmap #1994
Comments
@johnugeorge @deepanker13 Do we need to create tracking issue with remaining items for Train/Fine-tune API for LLMs ? |
I'd like to get #1953 merged as well. I think the risk is pretty low. |
@andreyvelich thanks for putting this together. On the "Misc: Improve docs for the training operator", if you can start a seprate issue highligintg known issues, doc areas to be improved or particular topics you want to address we can start coordinating with the release team doc leads as well to get some help. I would suggest having a separate issue for autogen APIs, in case you want to address that as well. |
@terrytangyuan Sure, can we discuss the MXJob deprecation plan on the next AutoML and Training WG meeting ?
@StefanoFioravanzo Sure, I will create an issue based on tasks that we discuss on the last call. |
First of all, as I mentioned here: kubeflow/katib#2255 (comment), I would suggest supporting kubernetes v1.27-v1.29. Also, Moving #1906 forward would be better. It probably isn't possible to complete all the tasks, but I think we will be able to get some results. |
SGTM. We can say that we don't any maintenance for MXJob during one release, which means it was deprecated. |
@andreyvelich Sounds good |
It's good point about Kubernetes version @tenzen-y! |
Ah, I found the features that we drop from the previous release due to the release deadline. Can we put the following to improve UX: |
I just had discussion with @kubeflow/release-managers on Kubernetes versions. |
It's nice notifications! Thank you! |
Okay I will create one |
Hello @kubeflow/wg-training-leads, this is a kind reminder that Monday, March 4th will be our Kubeflow 1.9 release development checkpoint, we will be halfway through our dev cycle, and we expect most of the work to be well underway (reminder: code freeze is scheduled for Apr 15th) Can you please acknowledge your status with respect to your roadmap, comment on the progress made so far, and provide an assessment of the work that remains? (understandably) Not everything may be completed in time. Please proactively let the release team know if there are delays, blockers, or uncertain situations, know so that we can align expectations and try and help you out, if possible. |
Hi ! When is the v1.8 is planned for release? Some managed k8s versions e.g EKS reach end of support very soon. (July 24, 2024) https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar |
Hi @satishpasumarthi, we are planing to make the first RC.0 for Training Operator v1.8 this week. |
Thanks for the reply @andreyvelich . I see only PRs for supporting v1.28 and v1.29 #2039 and #2038. My understanding was v1.27 is already supported in v1.7. Please correct me if I am mistaken |
@satishpasumarthi You're correct. |
Is there anything missing to cut the release? We want to start the manifests sync for training-operator for Kubeflow 1.9.0-rc0 |
Not yet. Johnu will prepare the release today. |
Any updates on when we might see a new release? |
You can find the new release here: https://github.com/kubeflow/training-operator/releases/tag/v1.8.0-rc.0 |
Training Operator 1.8 has been released 🎉 Thanks everyone for your contributions! |
This is the tracking issue for Training Operator 1.8 release.
The feature freeze date for the next Kubeflow 1.9 release is April 15th.
We are targeting the following features for Training Operator 1.8:
SDK
Backend
Misc
torchrun
and PyTorchJob: torchrun example with cpu version pytorch #1965@deepanker13 @droctothorpe @tenzen-y @kubeflow/wg-training-leads @kuizhiqing @terrytangyuan @lowang-bh Please let me know items that we want to add for Training Operator 1.8.
cc @kubeflow/release-team
The text was updated successfully, but these errors were encountered: