Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editing notebooks to prepare for DevSite import #28949

Merged
merged 1 commit into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions examples/notebooks/beam-ml/automatic_model_refresh.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@
" This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
"\n",
"\n",
"For DataflowRunner, the model needs to be stored remote location accessible by the Beam pipeline. So we will download `ResNet101` model and upload it to the GCS location.\n"
"For the Dataflow runner, you need to store the model in a remote location that the Apache Beam pipeline can access. For this example, download the `ResNet101` model, and upload it to the Google Cloud Storage bucket.\n"
],
"metadata": {
"id": "_AUNH_GJk_NE"
Expand Down Expand Up @@ -392,7 +392,7 @@
"source": [
"2. To read and preprocess the images, use the `preprocess_image` function. This example uses `Cat-with-beanie.jpg` for all inferences.\n",
"\n",
" **Note**: Image used for prediction is licensed in CC-BY. The creator is listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
" **Note**: The image used for prediction is licensed in CC-BY. The creator is listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
],
"metadata": {
"id": "8-sal2rFAxP2"
Expand Down Expand Up @@ -424,7 +424,7 @@
"cell_type": "markdown",
"source": [
"3. Pass the images to the RunInference `PTransform`. RunInference takes `model_handler` and `model_metadata_pcoll` as input parameters.\n",
" * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline\n",
" * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input updates the `model_uri` in the `model_handler` while the Apache Beam pipeline runs.\n",
" * Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.keras` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/dataflow/*keras'`.\n",
"\n"
],
Expand Down Expand Up @@ -483,7 +483,7 @@
"source": [
"### Watch for the model update\n",
"\n",
"After the pipeline starts processing data and when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in `.keras` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier.\n"
"After the pipeline starts processing data, when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in the `.keras` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier.\n"
],
"metadata": {
"id": "wYp-mBHHjOjA"
Expand Down
6 changes: 2 additions & 4 deletions examples/notebooks/beam-ml/mltransform_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
"id": "d3b81cf2-8603-42bd-995e-9e14631effd0"
},
"source": [
"This notebook demonstrates how to use `MLTransform` to preprocess your data for machine learning models. `MLTransform` is a `PTransform` that wraps multiple Apache Beam data processing transforms. As a result, `MLTransform` gives you the ability to preprocess different types of data in multiple ways with one transform.\n",
"This notebook demonstrates how to use `MLTransform` to preprocess your data for machine learning models. `MLTransform` is a `PTransform` that wraps multiple Apache Beam data processing transforms. With `MLTransform`, you can preprocess different types of data in multiple ways with one transform.\n",
"\n",
"This notebook uses data processing transforms defined in the [apache_beam/ml/transforms/tft](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.transforms.tft.html) module."
]
Expand Down Expand Up @@ -423,8 +423,6 @@
"source": [
"### Scale the data by using the z-score\n",
"\n",
"Scale to the data using the z-score\n",
"\n",
"Similar to `ScaleTo01`, use [ScaleToZScore](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.transforms.tft.html#apache_beam.ml.transforms.tft.ScaleToZScore) to scale the values by using the [z-score]([z-score](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/scale_to_z_score#:~:text=Scaling%20to%20z%2Dscore%20subtracts%20out%20the%20mean%20and%20divides%20by%20standard%20deviation.%20Note%20that%20the%20standard%20deviation%20computed%20here%20is%20based%20on%20the%20biased%20variance%20(0%20delta%20degrees%20of%20freedom)%2C%20as%20computed%20by%20analyzers.var.).\n"
],
"metadata": {
Expand Down Expand Up @@ -607,7 +605,7 @@
"\n",
"The previous examples show how to preprocess data for model training. This example uses the same preprocessing steps on the inference data. By using the same steps on the inference data, you can maintain consistent results.\n",
"\n",
"Preprocess the data going into the inference by using the same preprocessing steps used on the data prior to training. To do this with `MLTransform`, pass the artifact location from the previous transforms to the parameter `read_artifact_location`. `MLTransform` uses the values and artifacts produced in the previous steps. You don't need to provide the transforms, because they are saved with the artifacts in the artifact location.\n"
"Preprocess the data used by the inference by using the same preprocessing steps that you used on the data prior to training. When using `MLTransform`, pass the artifact location from the previous transforms to the parameter `read_artifact_location`. `MLTransform` uses the values and artifacts produced in the previous steps. You don't need to provide the transforms, because they are saved with the artifacts in the artifact location.\n"
],
"metadata": {
"id": "kcnQSwkA-eSA"
Expand Down
25 changes: 10 additions & 15 deletions examples/notebooks/beam-ml/per_key_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
"\n",
"In Apache Beam, the recommended way to run inference is to use the `RunInference` transform. By using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without having to manage memory yourself.\n",
"\n",
"This notebook demonstrates how to use a `KeyedModelHandler` to run inference in an Apache Beam pipeline with multiple different models on a per-key basis. This notebook uses pretrained pipelines from Hugging Face. Before continuing with this notebook, it is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)."
"This notebook demonstrates how to use a `KeyedModelHandler` to run inference in an Apache Beam pipeline with multiple different models on a per-key basis. This notebook uses pretrained pipelines from Hugging Face. Before continuing with this notebook, it is recommended that you walk through the [Use RunInference in Apache Beam](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) notebook."
],
"metadata": {
"id": "ZAVOrrW2An1n"
Expand All @@ -81,7 +81,7 @@
"source": [
"## Install dependencies\n",
"\n",
"First, install both Apache Beam and the dependencies needed by Hugging Face."
"Install both Apache Beam and the dependencies needed by Hugging Face."
],
"metadata": {
"id": "_fNyheQoDgGt"
Expand All @@ -107,12 +107,7 @@
}
],
"source": [
"# Note that this notebook currently installs from Beam head since this feature hasn't been released yet.\n",
"# It will be released with version 2.51.0, at which point you can install with the following command:\n",
"# !pip install apache_beam[gcp]>=2.51.0 --quiet\n",
"!git clone https://github.com/apache/beam\n",
"!pip install -r beam/sdks/python/build-requirements.txt\n",
"!pip install -e ./beam/sdks/python[gcp]\n",
"!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnandInguva @damccorm I'm planning to merge this after the 2.51.0 release, so this section needs to be updated, but please check to see if I updated it correctly. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I actually had a pr open to do this already (and it is now merged - #28801)

"!pip install torch --quiet\n",
"!pip install transformers --quiet\n",
"\n",
Expand Down Expand Up @@ -149,7 +144,7 @@
"\n",
"A model handler is the Apache Beam method used to define the configuration needed to load and invoke models. Because this example uses two models, we define two model handlers, one for each model. Because both models are incapsulated within Hugging Face pipelines, we use the model handler `HuggingFacePipelineModelHandler`.\n",
"\n",
"In this notebook, we load the models using Hugging Face and run them against an example. The models produce different outputs."
"For this example, load the models using Hugging Face, and then run them against an example. The models produce different outputs."
],
"metadata": {
"id": "uEqljVgCD7hx"
Expand Down Expand Up @@ -355,7 +350,7 @@
"source": [
"## Define the examples\n",
"\n",
"Next, define examples to input into the pipeline. The examples include their correct classifications."
"Define examples to input into the pipeline. The examples include the correct classifications."
],
"metadata": {
"id": "yd92MC7YEsTf"
Expand Down Expand Up @@ -392,7 +387,7 @@
"class FormatExamples(beam.DoFn):\n",
" \"\"\"\n",
" Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
" We use these keys to map our elements to the correct models.\n",
" Use these keys to map our elements to the correct models.\n",
" \"\"\"\n",
" def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
" yield (f'distilbert-{element[1]}', element[0])\n",
Expand All @@ -407,7 +402,7 @@
{
"cell_type": "markdown",
"source": [
"Use the formatted keys to define a `KeyedModelHandler` that maps keys to the `ModelHandler` used for those keys. The `KeyedModelHandler` method lets you define an optional `max_models_per_worker_hint`, which limits the number of models that can be held in a single worker process at one time. If you're worried about your worker running out of memory, use this option. For more information about managing memory, see [Use a keyed ModelHandler](https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler)."
"Use the formatted keys to define a `KeyedModelHandler` that maps keys to the `ModelHandler` used for those keys. The `KeyedModelHandler` method lets you define an optional `max_models_per_worker_hint`, which limits the number of models that can be held in a single worker process at one time. If your worker might run out of memory, use this option. For more information about managing memory, see [Use a keyed ModelHandler](https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler)."
],
"metadata": {
"id": "IP65_5nNGIb8"
Expand All @@ -433,9 +428,9 @@
"source": [
"## Postprocess the results\n",
"\n",
"The `RunInference` transform returns a Tuple containing:\n",
"The `RunInference` transform returns a tuple that contains the following objects:\n",
"* the original key\n",
"* a `PredictionResult` object containing the original example and the inference.\n",
"* a `PredictionResult` object containing the original example and the inference\n",
"Use those outputs to extract the relevant data. Then, to compare each model's prediction, group this data by the original example."
],
"metadata": {
Expand Down Expand Up @@ -510,7 +505,7 @@
"source": [
"## Run the pipeline\n",
"\n",
"Put together all of the pieces to run a single Apache Beam pipeline."
"To run a single Apache Beam pipeline, combine the previous steps."
],
"metadata": {
"id": "-LrpmM2PGAkf"
Expand Down