apache · damccorm · Oct 12, 2023 · Oct 11, 2023 · rszper · Oct 11, 2023
diff --git a/examples/notebooks/beam-ml/automatic_model_refresh.ipynb b/examples/notebooks/beam-ml/automatic_model_refresh.ipynb
@@ -248,7 +248,7 @@
         " This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
         "\n",
         "\n",
-        "For DataflowRunner, the model needs to be stored remote location accessible by the Beam pipeline. So we will download `ResNet101` model and upload it to the GCS location.\n"
+        "For the Dataflow runner, you need to store the model in a remote location that the Apache Beam pipeline can access. For this example, download the `ResNet101` model, and upload it to the Google Cloud Storage bucket.\n"
       ],
       "metadata": {
         "id": "_AUNH_GJk_NE"
@@ -392,7 +392,7 @@
       "source": [
         "2. To read and preprocess the images, use the `preprocess_image` function. This example uses `Cat-with-beanie.jpg` for all inferences.\n",
         "\n",
-        "  **Note**: Image used for prediction is licensed in CC-BY. The creator is listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
+        "  **Note**: The image used for prediction is licensed in CC-BY. The creator is listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
       ],
       "metadata": {
         "id": "8-sal2rFAxP2"
@@ -424,7 +424,7 @@
       "cell_type": "markdown",
       "source": [
         "3. Pass the images to the RunInference `PTransform`. RunInference takes `model_handler` and `model_metadata_pcoll` as input parameters.\n",
-        "  * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline\n",
+        "  * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input updates the `model_uri` in the `model_handler` while the Apache Beam pipeline runs.\n",
         "  * Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.keras` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/dataflow/*keras'`.\n",
         "\n"
       ],
@@ -483,7 +483,7 @@
       "source": [
         "### Watch for the model update\n",
         "\n",
-        "After the pipeline starts processing data and when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in `.keras` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier.\n"
+        "After the pipeline starts processing data, when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in the `.keras` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier.\n"
       ],
       "metadata": {
         "id": "wYp-mBHHjOjA"

diff --git a/examples/notebooks/beam-ml/mltransform_basic.ipynb b/examples/notebooks/beam-ml/mltransform_basic.ipynb
@@ -65,7 +65,7 @@
         "id": "d3b81cf2-8603-42bd-995e-9e14631effd0"
       },
       "source": [
-        "This notebook demonstrates how to use `MLTransform` to preprocess your data for machine learning models. `MLTransform` is a `PTransform` that wraps multiple Apache Beam data processing transforms. As a result, `MLTransform` gives you the ability to preprocess different types of data in multiple ways with one transform.\n",
+        "This notebook demonstrates how to use `MLTransform` to preprocess your data for machine learning models. `MLTransform` is a `PTransform` that wraps multiple Apache Beam data processing transforms. With `MLTransform`, you can preprocess different types of data in multiple ways with one transform.\n",
         "\n",
         "This notebook uses data processing transforms defined in the [apache_beam/ml/transforms/tft](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.transforms.tft.html) module."
       ]
@@ -423,8 +423,6 @@
       "source": [
         "### Scale the data by using the z-score\n",
         "\n",
-        "Scale to the data using the z-score\n",
-        "\n",
         "Similar to `ScaleTo01`, use [ScaleToZScore](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.transforms.tft.html#apache_beam.ml.transforms.tft.ScaleToZScore) to scale the values by using the [z-score]([z-score](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/scale_to_z_score#:~:text=Scaling%20to%20z%2Dscore%20subtracts%20out%20the%20mean%20and%20divides%20by%20standard%20deviation.%20Note%20that%20the%20standard%20deviation%20computed%20here%20is%20based%20on%20the%20biased%20variance%20(0%20delta%20degrees%20of%20freedom)%2C%20as%20computed%20by%20analyzers.var.).\n"
       ],
       "metadata": {
@@ -607,7 +605,7 @@
         "\n",
         "The previous examples show how to preprocess data for model training. This example uses the same preprocessing steps on the inference data. By using the same steps on the inference data, you can maintain consistent results.\n",
         "\n",
-        "Preprocess the data going into the inference by using the same preprocessing steps used on the data prior to training. To do this with `MLTransform`, pass the artifact location from the previous transforms to the parameter `read_artifact_location`. `MLTransform` uses the values and artifacts produced in the previous steps. You don't need to provide the transforms, because they are saved with the artifacts in the artifact location.\n"
+        "Preprocess the data used by the inference by using the same preprocessing steps that you used on the data prior to training. When using `MLTransform`, pass the artifact location from the previous transforms to the parameter `read_artifact_location`. `MLTransform` uses the values and artifacts produced in the previous steps. You don't need to provide the transforms, because they are saved with the artifacts in the artifact location.\n"
       ],
       "metadata": {
         "id": "kcnQSwkA-eSA"

diff --git a/examples/notebooks/beam-ml/per_key_models.ipynb b/examples/notebooks/beam-ml/per_key_models.ipynb
@@ -70,7 +70,7 @@
         "\n",
         "In Apache Beam, the recommended way to run inference is to use the `RunInference` transform. By using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without having to manage memory yourself.\n",
         "\n",
-        "This notebook demonstrates how to use a `KeyedModelHandler` to run inference in an Apache Beam pipeline with multiple different models on a per-key basis. This notebook uses pretrained pipelines from Hugging Face. Before continuing with this notebook, it is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)."
+        "This notebook demonstrates how to use a `KeyedModelHandler` to run inference in an Apache Beam pipeline with multiple different models on a per-key basis. This notebook uses pretrained pipelines from Hugging Face. Before continuing with this notebook, it is recommended that you walk through the [Use RunInference in Apache Beam](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) notebook."
       ],
       "metadata": {
         "id": "ZAVOrrW2An1n"
@@ -81,7 +81,7 @@
       "source": [
         "## Install dependencies\n",
         "\n",
-        "First, install both Apache Beam and the dependencies needed by Hugging Face."
+        "Install both Apache Beam and the dependencies needed by Hugging Face."
       ],
       "metadata": {
         "id": "_fNyheQoDgGt"
@@ -107,12 +107,7 @@
         }
       ],
       "source": [
-        "# Note that this notebook currently installs from Beam head since this feature hasn't been released yet.\n",
-        "# It will be released with version 2.51.0, at which point you can install with the following command:\n",
-        "# !pip install apache_beam[gcp]>=2.51.0 --quiet\n",
-        "!git clone https://github.com/apache/beam\n",
-        "!pip install -r beam/sdks/python/build-requirements.txt\n",
-        "!pip install -e ./beam/sdks/python[gcp]\n",
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
         "!pip install torch --quiet\n",
         "!pip install transformers --quiet\n",
         "\n",
@@ -149,7 +144,7 @@
         "\n",
         "A model handler is the Apache Beam method used to define the configuration needed to load and invoke models. Because this example uses two models, we define two model handlers, one for each model. Because both models are incapsulated within Hugging Face pipelines, we use the model handler `HuggingFacePipelineModelHandler`.\n",
         "\n",
-        "In this notebook, we load the models using Hugging Face and run them against an example. The models produce different outputs."
+        "For this example, load the models using Hugging Face, and then run them against an example. The models produce different outputs."
       ],
       "metadata": {
         "id": "uEqljVgCD7hx"
@@ -355,7 +350,7 @@
       "source": [
         "## Define the examples\n",
         "\n",
-        "Next, define examples to input into the pipeline. The examples include their correct classifications."
+        "Define examples to input into the pipeline. The examples include the correct classifications."
       ],
       "metadata": {
         "id": "yd92MC7YEsTf"
@@ -392,7 +387,7 @@
         "class FormatExamples(beam.DoFn):\n",
         "  \"\"\"\n",
         "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
-        "  We use these keys to map our elements to the correct models.\n",
+        "  Use these keys to map our elements to the correct models.\n",
         "  \"\"\"\n",
         "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
         "    yield (f'distilbert-{element[1]}', element[0])\n",
@@ -407,7 +402,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Use the formatted keys to define a `KeyedModelHandler` that maps keys to the `ModelHandler` used for those keys. The `KeyedModelHandler` method lets you define an optional `max_models_per_worker_hint`, which limits the number of models that can be held in a single worker process at one time. If you're worried about your worker running out of memory, use this option. For more information about managing memory, see [Use a keyed ModelHandler](https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler)."
+        "Use the formatted keys to define a `KeyedModelHandler` that maps keys to the `ModelHandler` used for those keys. The `KeyedModelHandler` method lets you define an optional `max_models_per_worker_hint`, which limits the number of models that can be held in a single worker process at one time. If your worker might run out of memory, use this option. For more information about managing memory, see [Use a keyed ModelHandler](https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler)."
       ],
       "metadata": {
         "id": "IP65_5nNGIb8"
@@ -433,9 +428,9 @@
       "source": [
         "## Postprocess the results\n",
         "\n",
-        "The `RunInference` transform returns a Tuple containing:\n",
+        "The `RunInference` transform returns a tuple that contains the following objects:\n",
         "* the original key\n",
-        "* a `PredictionResult` object containing the original example and the inference.\n",
+        "* a `PredictionResult` object containing the original example and the inference\n",
         "Use those outputs to extract the relevant data. Then, to compare each model's prediction, group this data by the original example."
       ],
       "metadata": {
@@ -510,7 +505,7 @@
       "source": [
         "## Run the pipeline\n",
         "\n",
-        "Put together all of the pieces to run a single Apache Beam pipeline."
+        "To run a single Apache Beam pipeline, combine the previous steps."
       ],
       "metadata": {
         "id": "-LrpmM2PGAkf"