Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Workflow-api based notebook tutorial for tensotflow.keras mnist based model with FedProx #1145

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

yontyon
Copy link

@yontyon yontyon commented Nov 13, 2024

No description provided.

@yontyon yontyon force-pushed the ybuchnik/workflow-adam-keras-mnist-notebook branch from fa730d3 to 5cfb831 Compare November 13, 2024 10:30
@yontyon yontyon changed the title [WIP] introduce a workflow interface based tensorflow.keras mnist tutorial notebook [WIP] introduce a workflow-api based notebook tutorial for tensotflow.keras mnist based model Nov 13, 2024
@yontyon yontyon force-pushed the ybuchnik/workflow-adam-keras-mnist-notebook branch from 5cfb831 to b557384 Compare November 13, 2024 10:32
@teoparvanov teoparvanov changed the title [WIP] introduce a workflow-api based notebook tutorial for tensotflow.keras mnist based model [WIP] Workflow-api based notebook tutorial for tensotflow.keras mnist based model with FedProx Nov 19, 2024
@rahulga1
Copy link
Collaborator

@kta-intel do we want to have modified FedProxOptimizer code part of tutorial?
Or should we modified it in the framework code itself. I know it will be reworked part of Keras 3 support in OpenFL.

"id": "14821d97",
"metadata": {},
"source": [
"# Workflow Interface Tutorial: Keras MINST"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename: Workflow API Tutorial: Keras MNIST

"metadata": {},
"outputs": [],
"source": [
"%pip install tensorflow\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tensorflow does not support Keras APIs without a legacy flag. Suggest taking a look at TensorFlow + Keras 2 backwards compatibility and modify as necessary.

My recommendation is to fully transition this tutorial to Keras 3 APIs instead of using deprecated Keras 2.

Copy link
Collaborator

@teoparvanov teoparvanov Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MasterSkepticista, I agree with your suggestion in general. However, in practice TaskRunner API currently supports up to Keras 2. Note that there are already preparations for a step-by-step upgrade to Keras 3. In the meantime, if we want the tutorial notebooks to be compatible with the upcoming FederatedRuntime (which uses TaskRunner API under the hood), we should stick with Keras 2 for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to temporarily stick with Keras 2, we also will need to pin tensorflow to a compatible version or install tf-keras to use legacy code.

However, since this tutorial is set to use LocalRuntime by default, I'm more inclined to say we should upgrade this and FedProx to Keras 3 API and then let the Task Runner/FederatedRunTime catch up. Which it should in a reasonable time.

Upgrading FedProx to Keras3 is not something that would naturally happen by updating the Task Runner, so it would be worth the effort to include it in this PR if @yontyon as the bandwidth and/or is still interested, or otherwise an adjacent PR

Open to thoughts and suggestions

Comment on lines +91 to +107
"def test_intel_tensorflow():\n",
" \"\"\"\n",
" Check if Intel version of TensorFlow is installed\n",
" \"\"\"\n",
" import tensorflow as tf\n",
"\n",
" print(\"We are using Tensorflow version {}\".format(tf.__version__))\n",
"\n",
" major_version = int(tf.__version__.split(\".\")[0])\n",
" if major_version >= 2:\n",
" from tensorflow.python.util import _pywrap_util_port\n",
" print(\"Intel-optimizations (DNNL) enabled:\",\n",
" _pywrap_util_port.IsMklEnabled())\n",
" else:\n",
" print(\"Intel-optimizations (DNNL) enabled:\")\n",
"\n",
"test_intel_tensorflow()"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required. Unnecessary cognitive overhead + all tensorflow binaries are Intel-optimized by default.

"id": "a61a876d",
"metadata": {},
"source": [
"Now that the flow is complete, let's dig into some of the information captured along the way"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the purpose of all cells after this comment. Suggest adding outputs, if relevant, or remove those cells altogether.

Copy link
Collaborator

@teoparvanov teoparvanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a couple of comments and suggestions!

"id": "bd059520",
"metadata": {},
"source": [
"This tutorial demonstrates how to train a model using TensorFlow Keras on the MNIST dataset, leveraging the new FedAI Workflow Interface. The Workflow Interface provides a novel way to compose federated learning experiments with OpenFL, enabling researchers to handle non-IID data and perform federated averaging with optimizations like FedProx. Through this tutorial, you will learn how to set up the federated learning environment, define the flow, and execute the training process across multiple collaborators.\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FedAI -> OpenFL

"metadata": {},
"outputs": [],
"source": [
"%pip install tensorflow\n",
Copy link
Collaborator

@teoparvanov teoparvanov Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MasterSkepticista, I agree with your suggestion in general. However, in practice TaskRunner API currently supports up to Keras 2. Note that there are already preparations for a step-by-step upgrade to Keras 3. In the meantime, if we want the tutorial notebooks to be compatible with the upcoming FederatedRuntime (which uses TaskRunner API under the hood), we should stick with Keras 2 for now.

"id": "7237eac4",
"metadata": {},
"source": [
"The FedAvgWeights method is used in our federated learning architecture to aggregate model weights from multiple collaborators. In this approach, each collaborator trains a local model on its own data and then sends the model weights to a central aggregator. The aggregator runs the FedAvgWeights method to calculate the aggregated model weights by averaging the weights from all collaborators. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of nits:

  • "its own data" -> "their local data"
  • FedAvgWeights -> FedAvg

"for idx, collaborator_name in enumerate(collaborator_names):\n",
" collaborators.append(\n",
" Collaborator(\n",
" name=collaborator_name, num_cpus=0, num_gpus=0.3,\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_cpus=1, num_gpus=0 ?

"\n",
"\n",
"@keras.utils.register_keras_serializable()\n",
"class FedProxOptimizer(keras.optimizers.legacy.Optimizer):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing it inline, I suggest to update and use the FedProxOptimizer class from the utilities.optimizers.keras package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative option here would be to implement FedProx via a custom loss function instead of at the optimizer level.

However, as far as I know, keras FedProx is not used in any "live" tutorials or workflows, so it would be safe to upgrade the package directly too. Only caveat is that we will eventually need to update it for Keras3, sooner or later

Comment on lines +629 to +630
"/home/yoni/github/openfl/.venv/lib/python3.8/site-packages/tensorflow/python/data/ops/structured_function.py:265: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable_debug_mode()`.\n",
" warnings.warn(\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to silence or hide this warning? (especially if it mentions your name as in /home/yoni/github/openfl 😅 )

"metadata": {},
"source": [
"# Congratulations!\n",
"Now that you've completed your first workflow interface quickstart notebook, see some of the more advanced things you can do in our [other tutorials](broken_link), including:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other tutorials link is broken

@kta-intel
Copy link
Collaborator

@kta-intel do we want to have modified FedProxOptimizer code part of tutorial? Or should we modified it in the framework code itself. I know it will be reworked part of Keras 3 support in OpenFL.

In my opinion, I recommend not modifying FedProx inline for this tutorial. I think it would be better to update the framework code itself. I mentioned it in one of the review threads because I missed your message (sorry), but FedProx actually exists outside the taskrunner, so the efforts to upgrade the taskrunner to Keras3 would not capture FedProx. It would be more impactful to get that upgraded at the framework level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants