Replicating FEDMCCS client selection in Federated Learning
The paper is in the link : FEDMCCS
At first run the python3 generate_clients.py
then run the runner.sh
with bash or sh.
If you get status code 137 it's likely caused by OOM killer. See /var/log/kern.log
.
To create non-IID data, we use the same method as here; We simply split the dataset in random chunks and give each client each chunk.
To replicate the effect of ever expanding dataset, we at first start with half of the dataset and increase the size of it by 5% in each round. This continues until we reach the full dataset size. From then, the expansion stops.
Metrics are stored in client as well as server. When server requests a train, client spawns a watcher thread in order to record the frequency and memory usage while the training is in process. This is done via psutil
program. These metrics are averaged and stored in client. When server requests the data from client, the data of the last round will be returned.
Training is done on MNIST dataset. Validation is done on the whole test dataset of MNIST (instead of a portion of it).
Server can use client.get_properties
in order to get the properties needed for client selection.
Another piece of software is the proxy-meter. This tool is designed to simulate bandwidth and ping and report bandwidth usage. The usage follows:
--listen ADDRESS: On what port and interface should proxy listen on?
--forward ADDRESS: Where should data be forwarded?
--ping TIME: The simulated ping. Defaults to zero.
--speed NUMBER: The speed of the data transfer. Defaults to zero. Use zero for unlimited. Units are bytes per second.
For each client, a separate proxy meter will be spawned. You can turn this feature off by using generate_compose_file_direct
function in generate_clients.py
file.
The server runs on 0.0.0.0:8080 on the linux server.
Once the first clinet connects to the server, it starts the procedure. The overall server code is similar to Flower Base Server with changes to clientManager.
myClientManager
class is a class which inherits fl.server.SimpleClientManager
. We need to override the sample
method defined in the parent class and used for client selection at the beginnig of each evaluation and training round.
The method, first check if there are any clients available and if there were enough clients, it would proceed to select the clients according to a criteria.
In the first few rounds, all of the clients are given a round so that the server could have enough data for the linear regression.
RANDOM_ROUNDS
is a variable which determines how much rounds all the clients are trained so we have a fair amount of data.
After some rounds, clients are selected according to the linear regression method. It gets a list of all available clinets and first checks their historical data in which there are 4 parameters :
predict utilization
predicts the next rounds Cores, Frequencies, Memories and Training Times according to the dataset size which is going to be used.
If the wanted number of clients are selected according to CLIENT FRACTION
, there will be no training.
After the clients are selected, we might have a problem : Less clients are selected because of the budget limit. So to mitigate this impact, if less clients are selected in the linear regression, the remainder of selected clients are randomly selected. This ensures convergence.
Client | CPU Cores | Cpu Utilization | Memory Limit(MB) | Ping Latency(ms) | Bandwidth (Mbps) |
---|---|---|---|---|---|
Client1 | 1 | 0.01 | 250 | 100 | 512 |
Client2 | 1 | 0.01 | 150 | 100 | 512 |
Client3 | 1 | 1 | 200 | no latency | 1024 |
Client4 | 1 | 0.01 | 150 | 100 | 512 |
Client5 | 1 | 1 | 100 | 100 | 512 |
Client6 | 1 | 1 | 50 | 50 | 512 |
Client7 | 1 | 0.01 | 100 | 100 | 512 |
Client8 | 1 | 0.8 | No Limit | no latency | 1024 |
Client9 | 1 | 0.01 | 100 | 100 | 512 |
Client10 | 1 | 1 | 100 | 100 | 512 |
Client11 | 1 | 0.01 | 100 | 100 | 512 |
Resource | Budget |
---|---|
Memory Budget | 0.8 |
CPU Budget | 1 |
Energy Budget | 1 |
Time Threshold(s) | 20 |
Parameter | |
---|---|
Client Fraction | 0.5 |
Number of Rounds | 15 |
Random Rounds (Before Regression) | 5 |
Dataset | MNIST |
Model | CNN |
Round | FEDMCCS | Random Select |
---|---|---|
round1 | 19.2 | 13.5 |
round2 | 49.4 | 43 |
round3 | 79.6 | 72.2 |
round4 | 109.7 | 101.5 |
round5 | 139.7 | 130.8 |
round6 | 169.6 | 160 |
round7 | 191.5 | 189.3 |
round8 | 203.3 | 214.3 |
round9 | 219.3 | 229.9 |
round10 | 241.2 | 259.1 |
round11 | 356.4 | 373.6 |
round12 | 1329.4 | 1519.6 |
round13 | 2268.4 | 2662.6 |
round14 | 3183.4 | 3889.6 |
round15 | 4194.4 | 5120.6 |
Time | FEDMCCS | Time | Random | |
---|---|---|---|---|
19.2 | 0.33 | 13.5 | 0.25 | |
49.4 | 0.48 | 43 | 0.44 | |
79.6 | 0.56 | 72.2 | 0.58 | |
109.7 | 0.63 | 101.5 | 0.65 | |
139.7 | 0.7 | 130.8 | 0.72 | |
169.6 | 0.75 | 160 | 0.76 | |
191.5 | 0.79 | 189.3 | 0.8 | |
203.3 | 0.81 | 214.3 | 0.82 | |
219.3 | 0.83 | 229.9 | 0.84 | |
241.2 | 0.84 | 259.1 | 0.85 | |
356.4 | 0.85 | 373.6 | 0.85 | |
1329.4 | 0.87 | 1519.6 | 0.86 | |
2268.4 | 0.868 | 2662.6 | 0.87 | |
3183.4 | 0.873 | 3889.6 | 0.87 | |
4194.4 | 0.877 | 5120.6 | 0.87 |
This project implemented FEDMCCS which is a federated learning algorithm that selects clients based on a learning method.