[LFX'24] Add Sedna Federated Learning v2 Proposal. #455

Electronic-Waste · 2024-11-04T16:26:01Z

What type of PR is this?

/kind design

What this PR does / why we need it:

This PR contains the proposal for Sedna Federated Learning V2 (updated version after the last community meeting).

Related to LFX'24 Fall Project: kubeedge/kubeedge#5762

cc👀 @Shelley-BaoYue @fisherxu @tangming1996 @MooreZheng @hsj576

Which issue(s) this PR fixes:

Fixes #

Signed-off-by: Electronic-Waste <[email protected]>

MooreZheng

The proposal then should further consider the design of data-centric task scheduling.

In the previous routine meeting, we see that there are challenges integrating training-operator with Sedna at the beginning: that is about what to schedule in Sedna federated learning. Since federated learning is also a training task, it is in fact data-driven. When we schedule a training task without scheduling the training data, it can lead to significant training bias.

At the meeting, we see that there are mainly two possible ways to build a practical system.

Assume that there are subnets where data can be scheduled within the same subnet.
As suggested via @tangming1996, KubeEdge itself also has node-group management that can be used to fulfill the subnet assumption.
Develop a method to transfer non-raw data, e.g., embedding, where raw data can not be recovered from non-raw data.

Besides, Kubeflow assume all training workers share the same parameter and dataset, which is not practical for edge tasks where workers have different parameters and datasets. That means we need an edge version of training operator.

Signed-off-by: Electronic-Waste <[email protected]>

Electronic-Waste · 2024-11-21T15:32:34Z

I've updated the proposal according to the reviews in the routine today.

PTAL👀 @tangming1996 @MooreZheng @hsj576 @Shelley-BaoYue @fisherxu

tangming1996 · 2024-11-22T01:33:54Z

/lgtm

MooreZheng · 2024-11-22T01:59:08Z

/lgtm

MooreZheng

This proposal aims to solve a complicated and critical issue of Sedna by considering distributed training in all different schemes. The current version is fine as the very first version after rounds of discussion.

Note that since the whole distributed training for all schemes is undoubtedly challenging, it would introduce tons of new features to Sedna. We believe that it is still possible to enrich in the future. As mentioned at the routine meeting, the "DataLoader DS" and the edge-wise data transfer have the potential to be implemented based on EdgeMesh @Poorunga , which should be considered in the implementation and future versions of the proposal.

kubeedge-bot · 2024-11-22T02:07:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MooreZheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~docs/OWNERS~~ [MooreZheng]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

feat: add sedna federated learning v2 proposal.

4ba8fb7

Signed-off-by: Electronic-Waste <[email protected]>

kubeedge-bot added the kind/design Categorizes issue or PR as related to design. label Nov 4, 2024

kubeedge-bot requested review from JimmyYang20 and TymonXie November 4, 2024 16:26

kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2024

MooreZheng suggested changes Nov 7, 2024

View reviewed changes

kubeedge-bot assigned MooreZheng Nov 7, 2024

fix: data-centric scheduling.

8a6c5e7

Signed-off-by: Electronic-Waste <[email protected]>

Electronic-Waste requested a review from MooreZheng November 21, 2024 15:34

kubeedge-bot assigned tangming1996 Nov 22, 2024

kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 22, 2024

MooreZheng approved these changes Nov 22, 2024

View reviewed changes

kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2024

kubeedge-bot merged commit 01351c5 into kubeedge:main Nov 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LFX'24] Add Sedna Federated Learning v2 Proposal. #455

[LFX'24] Add Sedna Federated Learning v2 Proposal. #455

Electronic-Waste commented Nov 4, 2024

MooreZheng left a comment •

edited

Loading

Electronic-Waste commented Nov 21, 2024

tangming1996 commented Nov 22, 2024

MooreZheng commented Nov 22, 2024

MooreZheng left a comment •

edited

Loading

kubeedge-bot commented Nov 22, 2024

[LFX'24] Add Sedna Federated Learning v2 Proposal. #455

[LFX'24] Add Sedna Federated Learning v2 Proposal. #455

Conversation

Electronic-Waste commented Nov 4, 2024

MooreZheng left a comment • edited Loading

Choose a reason for hiding this comment

Electronic-Waste commented Nov 21, 2024

tangming1996 commented Nov 22, 2024

MooreZheng commented Nov 22, 2024

MooreZheng left a comment • edited Loading

Choose a reason for hiding this comment

kubeedge-bot commented Nov 22, 2024

MooreZheng left a comment •

edited

Loading

MooreZheng left a comment •

edited

Loading