-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LFX'24] Add Sedna Federated Learning v2 Proposal. #455
Conversation
Signed-off-by: Electronic-Waste <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal then should further consider the design of data-centric task scheduling.
In the previous routine meeting, we see that there are challenges integrating training-operator with Sedna at the beginning: that is about what to schedule in Sedna federated learning. Since federated learning is also a training task, it is in fact data-driven. When we schedule a training task without scheduling the training data, it can lead to significant training bias.
At the meeting, we see that there are mainly two possible ways to build a practical system.
- Assume that there are subnets where data can be scheduled within the same subnet.
As suggested via @tangming1996, KubeEdge itself also has node-group management that can be used to fulfill the subnet assumption. - Develop a method to transfer non-raw data, e.g., embedding, where raw data can not be recovered from non-raw data.
Besides, Kubeflow assume all training workers share the same parameter and dataset, which is not practical for edge tasks where workers have different parameters and datasets. That means we need an edge version of training operator.
Signed-off-by: Electronic-Waste <[email protected]>
I've updated the proposal according to the reviews in the routine today. PTAL👀 @tangming1996 @MooreZheng @hsj576 @Shelley-BaoYue @fisherxu |
/lgtm |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal aims to solve a complicated and critical issue of Sedna by considering distributed training in all different schemes. The current version is fine as the very first version after rounds of discussion.
Note that since the whole distributed training for all schemes is undoubtedly challenging, it would introduce tons of new features to Sedna. We believe that it is still possible to enrich in the future. As mentioned at the routine meeting, the "DataLoader DS" and the edge-wise data transfer have the potential to be implemented based on EdgeMesh @Poorunga , which should be considered in the implementation and future versions of the proposal.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MooreZheng The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind design
What this PR does / why we need it:
This PR contains the proposal for Sedna Federated Learning V2 (updated version after the last community meeting).
Related to LFX'24 Fall Project: kubeedge/kubeedge#5762
cc👀 @Shelley-BaoYue @fisherxu @tangming1996 @MooreZheng @hsj576
Which issue(s) this PR fixes:
Fixes #