-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Workflow-api based notebook tutorial for tensotflow.keras mnist based model with FedProx #1145
base: develop
Are you sure you want to change the base?
Conversation
fa730d3
to
5cfb831
Compare
… mnist based model Signed-off-by: Buchnik, Yehonatan <[email protected]>
5cfb831
to
b557384
Compare
@kta-intel do we want to have modified FedProxOptimizer code part of tutorial? |
"id": "14821d97", | ||
"metadata": {}, | ||
"source": [ | ||
"# Workflow Interface Tutorial: Keras MINST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename: Workflow API Tutorial: Keras MNIST
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install tensorflow\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New tensorflow does not support Keras APIs without a legacy flag. Suggest taking a look at TensorFlow + Keras 2 backwards compatibility and modify as necessary.
My recommendation is to fully transition this tutorial to Keras 3 APIs instead of using deprecated Keras 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MasterSkepticista, I agree with your suggestion in general. However, in practice TaskRunner API currently supports up to Keras 2. Note that there are already preparations for a step-by-step upgrade to Keras 3. In the meantime, if we want the tutorial notebooks to be compatible with the upcoming FederatedRuntime
(which uses TaskRunner API under the hood), we should stick with Keras 2 for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to temporarily stick with Keras 2, we also will need to pin tensorflow
to a compatible version or install tf-keras
to use legacy code.
However, since this tutorial is set to use LocalRuntime
by default, I'm more inclined to say we should upgrade this and FedProx to Keras 3 API and then let the Task Runner
/FederatedRunTime
catch up. Which it should in a reasonable time.
Upgrading FedProx to Keras3 is not something that would naturally happen by updating the Task Runner, so it would be worth the effort to include it in this PR if @yontyon as the bandwidth and/or is still interested, or otherwise an adjacent PR
Open to thoughts and suggestions
"def test_intel_tensorflow():\n", | ||
" \"\"\"\n", | ||
" Check if Intel version of TensorFlow is installed\n", | ||
" \"\"\"\n", | ||
" import tensorflow as tf\n", | ||
"\n", | ||
" print(\"We are using Tensorflow version {}\".format(tf.__version__))\n", | ||
"\n", | ||
" major_version = int(tf.__version__.split(\".\")[0])\n", | ||
" if major_version >= 2:\n", | ||
" from tensorflow.python.util import _pywrap_util_port\n", | ||
" print(\"Intel-optimizations (DNNL) enabled:\",\n", | ||
" _pywrap_util_port.IsMklEnabled())\n", | ||
" else:\n", | ||
" print(\"Intel-optimizations (DNNL) enabled:\")\n", | ||
"\n", | ||
"test_intel_tensorflow()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required. Unnecessary cognitive overhead + all tensorflow binaries are Intel-optimized by default.
"id": "a61a876d", | ||
"metadata": {}, | ||
"source": [ | ||
"Now that the flow is complete, let's dig into some of the information captured along the way" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the purpose of all cells after this comment. Suggest adding outputs, if relevant, or remove those cells altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of comments and suggestions!
"id": "bd059520", | ||
"metadata": {}, | ||
"source": [ | ||
"This tutorial demonstrates how to train a model using TensorFlow Keras on the MNIST dataset, leveraging the new FedAI Workflow Interface. The Workflow Interface provides a novel way to compose federated learning experiments with OpenFL, enabling researchers to handle non-IID data and perform federated averaging with optimizations like FedProx. Through this tutorial, you will learn how to set up the federated learning environment, define the flow, and execute the training process across multiple collaborators.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FedAI -> OpenFL
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install tensorflow\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MasterSkepticista, I agree with your suggestion in general. However, in practice TaskRunner API currently supports up to Keras 2. Note that there are already preparations for a step-by-step upgrade to Keras 3. In the meantime, if we want the tutorial notebooks to be compatible with the upcoming FederatedRuntime
(which uses TaskRunner API under the hood), we should stick with Keras 2 for now.
"id": "7237eac4", | ||
"metadata": {}, | ||
"source": [ | ||
"The FedAvgWeights method is used in our federated learning architecture to aggregate model weights from multiple collaborators. In this approach, each collaborator trains a local model on its own data and then sends the model weights to a central aggregator. The aggregator runs the FedAvgWeights method to calculate the aggregated model weights by averaging the weights from all collaborators. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of nits:
- "its own data" -> "their local data"
- FedAvgWeights -> FedAvg
"for idx, collaborator_name in enumerate(collaborator_names):\n", | ||
" collaborators.append(\n", | ||
" Collaborator(\n", | ||
" name=collaborator_name, num_cpus=0, num_gpus=0.3,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_cpus=1, num_gpus=0
?
"\n", | ||
"\n", | ||
"@keras.utils.register_keras_serializable()\n", | ||
"class FedProxOptimizer(keras.optimizers.legacy.Optimizer):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing it inline, I suggest to update and use the FedProxOptimizer
class from the utilities.optimizers.keras
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative option here would be to implement FedProx via a custom loss function instead of at the optimizer level.
However, as far as I know, keras FedProx is not used in any "live" tutorials or workflows, so it would be safe to upgrade the package directly too. Only caveat is that we will eventually need to update it for Keras3, sooner or later
"/home/yoni/github/openfl/.venv/lib/python3.8/site-packages/tensorflow/python/data/ops/structured_function.py:265: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable_debug_mode()`.\n", | ||
" warnings.warn(\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to silence or hide this warning? (especially if it mentions your name as in /home/yoni/github/openfl
😅 )
"metadata": {}, | ||
"source": [ | ||
"# Congratulations!\n", | ||
"Now that you've completed your first workflow interface quickstart notebook, see some of the more advanced things you can do in our [other tutorials](broken_link), including:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other tutorials
link is broken
In my opinion, I recommend not modifying FedProx inline for this tutorial. I think it would be better to update the framework code itself. I mentioned it in one of the review threads because I missed your message (sorry), but FedProx actually exists outside the taskrunner, so the efforts to upgrade the taskrunner to Keras3 would not capture FedProx. It would be more impactful to get that upgraded at the framework level |
No description provided.