-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TensorFlow Task Runner and related workspaces #985
base: develop
Are you sure you want to change the base?
Update TensorFlow Task Runner and related workspaces #985
Conversation
…ormat Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
…reate new tf_cnn_mnist workspace Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Most of my comments are nitpicks around naming/formatting and/or structure. Please disposition as you find relevant. |
@MasterSkepticista I agree the TF v1 / Legacy task runners can be removed at this point. |
Thanks @MasterSkepticista @psfoley !
Great, I am also in agreement with removing the old TF runner. Actually, one reason why I did not propose it directly in this PR was because the tf2dunet workspaces uses the old TF runner. I can update it, but since it uses the BraTS dataset, I will have to get approval so I can test it |
99c82ef
to
cacd207
Compare
…orflowV1 workspace Signed-off-by: kta-intel <[email protected]>
cacd207
to
3dc7170
Compare
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
db90ce9
to
e9a7364
Compare
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
This reverts commit 49e97c1. Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
6f3f3da
to
dad006f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @kta-intel !
Please consider my review as a learning exercise of OpenFL and TF/Keras, so do take my comments and questions from that perspective 😊
for param in metrics: | ||
if param not in model_metrics_names: | ||
raise ValueError( | ||
f'KerasTaskRunner does not support specifying new metrics. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - I suppose you mean TensorFlowTaskRunner
here?
model_metrics_names = self.model.metrics_names | ||
if type(vals) is not list: | ||
vals = [vals] | ||
ret_dict = dict(zip(model_metrics_names, vals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose TF guarantees that the metric names and values are in the same order, so we can safely zip together the two lists?
for param in param_metrics: | ||
if param not in model_metrics_names: | ||
raise ValueError( | ||
f'KerasTaskRunner does not support specifying new metrics. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as a above - shouldn't this rather say TensorFlowTaskRunner
?
if with_opt_vars: | ||
weight_names = [weight.name for weight in obj.variables] | ||
|
||
Args: | ||
with_opt_vars (bool): Specify if we also want to get the variables | ||
of the optimizer | ||
weight_names = [weight.name for weight in obj.weights] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved readability, I'd suggest to add an explicit else:
block, as in
if with_opt_vars:
...
else:
...
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also document the suffix
parameter? (its meaning isn't very obvious for the inexperienced OpenFL reader)
opt_weights_dict = { | ||
name: tensor_dict[name] for name in opt_weight_names | ||
} | ||
self._set_weights_dict(self.model, model_weights_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this statement can also be moved outside of the if
statement as it's executed in both cases:
self._set_weights_dict(self.model, model_weights_dict)
y_train = np.eye(num_classes)[y_train] | ||
y_valid = np.eye(num_classes)[y_valid] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be equivalent to the following syntax (a bit more explicit IMHO):
from keras.utils import to_categorical
y_train = to_categorical(y_train, num_classes)
y_valid = to_categorical(y_valid, num_classes)
for metric in metrics: | ||
value = np.mean([history.history[metric]]) | ||
results.append(Metric(name=metric, value=np.array(value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we take the mean value, as opposed to the latest one?
If we consider accuracy for instance, wouldn't we be interested in the accuracy at the end of the training (which would certainly be higher than the mean value)?
y_train = np.eye(num_classes)[y_train] | ||
y_valid = np.eye(num_classes)[y_valid] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - would it make sense to use to_categorical
from keras.utils
instead?
|
||
self.opt_vars = self.optimizer.variables() | ||
print(f'optimizer vars: {self.opt_vars}') | ||
def train_(self, batch_generator, metrics: list = None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case, could we consider not even overriding _train()
? It looks like in this instance, it practically does the same as the base method (seemingly differing in terms of validity checks)?
Related Issue: #973
Summary:
This PR aims to update the TensorFlow Task Runner to use Keras as the high-level API, which is in line with best practices as well as updates existing TF workspaces. This enables the usage of non-legacy optimizers (which will be deprecated in future versions of TF/Keras)
Specifically, this PR:
TensorFlowTaskRunner
class inopenfl.federated.task.runner_tf
which borrows heavily from theKerasTaskRunner
task. Major difference is in handling the weights of the optimizer which was necessitated by the removal of the.get_weight()
and.weights()
attributes from the optimizer. This newTensorFlowTaskRunner
extracts weights from the.variables()
attributetrain_validation
andtask_validation
to be consistent with the torch taskrunnerTensorFlowTaskRunner
asTensorFlowTaskRunner_v1
withinopenfl.federated.task.runner_tf
and updated the__init__
files to make it callable. Rationale is to avoid any breaking changes for tutorials or upstream applications that still relied on the low-level TF taskrunner. This can be removed entirely in a future release as neededtrain_validation
andtask_validation
to be consistent with the torch taskrunnertf_cnn_mnist
workspace and updated thetorch_cnn_histology
workspace to run on the newTensorFlowTaskRunner
using thesrc/dataloader.py
andsrc/taskrunner.py
convention.TensorFlow v2.15.1
(latest TensorFlow to not useKeras v3.x
by default)tf_3dunet_brats
to use newTensorFlowTaskRunner
(did not make changes tosrc
files because I did not have Brats3D dataset to verify a large updatetf_2dunet
to run on archivedTensorFlowTaskRunner_v1
Future work
tf_2d_unet
fromTensorFlowTaskRunner_v1
to newTensorFlowTaskRunner
KerasTaskRunner
to newTensorFlowTaskRunner
and remove/archive KerasTaskRunnerTensorFlowTaskRunner
to run onTF v2.16+
withKeras 3.x
(this may need some large changes to weight handling that will likely not have backwards compatibility)