From 9491b7472c3808769ca59f886ef2635c6f59930e Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Wed, 7 Feb 2024 00:46:13 +0100
Subject: [PATCH 01/25] ml prompts initial commit

---
 .../33_ml_model_training.md                   | 20 +++++++++++++++++++
 .../34_ai_model_evaluation.md                 |  0
 2 files changed, 20 insertions(+)
 create mode 100644 learning/prompts/documentation-lookup/33_ml_model_training.md
 create mode 100644 learning/prompts/documentation-lookup/34_ai_model_evaluation.md

diff --git a/learning/prompts/documentation-lookup/33_ml_model_training.md b/learning/prompts/documentation-lookup/33_ml_model_training.md
new file mode 100644
index 000000000000..7a91e8acb870
--- /dev/null
+++ b/learning/prompts/documentation-lookup/33_ml_model_training.md
@@ -0,0 +1,20 @@
+Prompt:
+How can I use Apache Beam in AI/ML pipeline?
+Response:
+You can use Apache Beam for data ingestion, validation, preprocessing, model validation, and model deployment and inference.
+
+Apache Beam has a rich set of  [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation.
+
+Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference.
+
+Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/).
+
+Machine learning inference can be performed in Apache Beam using the `RunInference API`.
+[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/).
+
+RunInference works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM.
+
+For more information on building AI/ML pipelines see [here](https://beam.apache.org/documentation/ml/overview/).
+
+
+
diff --git a/learning/prompts/documentation-lookup/34_ai_model_evaluation.md b/learning/prompts/documentation-lookup/34_ai_model_evaluation.md
new file mode 100644
index 000000000000..e69de29bb2d1

From 10475fb7730279c0c0e84eb6e7d302fb2de8827a Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Wed, 7 Feb 2024 08:21:48 +0100
Subject: [PATCH 02/25] preprocess data added

---
 ...odel_training.md => 33_ai_ml_pipelines.md} |  0
 .../34_ai_preprocess_data.md                  | 33 +++++++++++++++++++
 ...el_evaluation.md => 35_ai_explore_data.md} |  0
 3 files changed, 33 insertions(+)
 rename learning/prompts/documentation-lookup/{33_ml_model_training.md => 33_ai_ml_pipelines.md} (100%)
 create mode 100644 learning/prompts/documentation-lookup/34_ai_preprocess_data.md
 rename learning/prompts/documentation-lookup/{34_ai_model_evaluation.md => 35_ai_explore_data.md} (100%)

diff --git a/learning/prompts/documentation-lookup/33_ml_model_training.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
similarity index 100%
rename from learning/prompts/documentation-lookup/33_ml_model_training.md
rename to learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
diff --git a/learning/prompts/documentation-lookup/34_ai_preprocess_data.md b/learning/prompts/documentation-lookup/34_ai_preprocess_data.md
new file mode 100644
index 000000000000..c97c11016873
--- /dev/null
+++ b/learning/prompts/documentation-lookup/34_ai_preprocess_data.md
@@ -0,0 +1,33 @@
+Prompt:
+How can I preprocess data for AI/ML pipeline with Apache Beam?
+Response:
+Apache Beam provides a rich set of data processing [transforms](https://beam.apache.org/documentation/programming-guide/#applying-transforms) that can be used to preprocess data for training and inference.
+To simplify your data preprocessing, Apache Beam provides a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) that wraps transforms in one class allowing you to ensure data consistency using the same steps for training and inference.
+
+MLTransform can do a full pass on the dataset allowing you to perform the followingcommon preprocessing steps:
+- normalization of input values using a minimum-maximum scaling
+- convert `floats` to `integers` using a bucketization based on the input data distribution
+- convert string to integers by generating a vocabulary over an entire dataset
+- calculate [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) weights for text data
+
+For a full list of supported `MLTransforms` and I/O requirements see [here](https://beam.apache.org/documentation/ml/preprocess-data/#data-processing-transforms-that-use-tft) .
+
+MLTransform can operate in read or write mode generating or consuming artifacts such as vocabulary or statistics. See an example of `write mode` below:
+```python
+  with beam.Pipeline() as p:
+    transformed_data = (
+        p
+        | beam.Create(data)
+        | MLTransform(write_artifact_location=artifact_location).with_transform(
+            <TRANSFORM_FUNCTION_NAME>)
+        | beam.Map(print))
+
+```
+For more examples of using `MLTransform` see [here](https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/).
+
+MLTransform also can be used to generate text embeddings to push data into vector stores or to be used as input for machine learning models. Supported text embedding transforms include `SentenceTransformerEmbeddings` for the `Hugging Face` [sentence-transformers](https://huggingface.co/sentence-transformers) and `VertexAITextEmbeddings` for the [Vertex AI text-embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).
+
+`MLTransform` is available in the Python SDK versions 2.53.0 and later. Only pipelines that use [default windows](https://beam.apache.org/documentation/programming-guide/#single-global-window) are supported. 
+
+
+
diff --git a/learning/prompts/documentation-lookup/34_ai_model_evaluation.md b/learning/prompts/documentation-lookup/35_ai_explore_data.md
similarity index 100%
rename from learning/prompts/documentation-lookup/34_ai_model_evaluation.md
rename to learning/prompts/documentation-lookup/35_ai_explore_data.md

From 6ab51e06d926b04c943b0211ebc6f897f1fa4596 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 12 Feb 2024 01:49:18 +0100
Subject: [PATCH 03/25] orchedstration prompts added

---
 learning/prompts/documentation-lookup/33_ai_ml_pipelines.md     | 2 +-
 .../prompts/documentation-lookup/36_ai_kfp_orchestration.md     | 0
 .../prompts/documentation-lookup/37_ai_tfx_orchestration.md     | 0
 3 files changed, 1 insertion(+), 1 deletion(-)
 create mode 100644 learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
 create mode 100644 learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md

diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index 7a91e8acb870..85e45b0234a3 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -12,7 +12,7 @@ Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/)
 Machine learning inference can be performed in Apache Beam using the `RunInference API`.
 [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/).
 
-RunInference works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM.
+`RunInference` works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM.
 
 For more information on building AI/ML pipelines see [here](https://beam.apache.org/documentation/ml/overview/).
 
diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
new file mode 100644
index 000000000000..e69de29bb2d1

From 3156103b157899219d88c8c38d0528f26803b86b Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 12 Feb 2024 18:32:50 +0100
Subject: [PATCH 04/25] orchestration prompts

---
 .../33_ai_ml_pipelines.md                     |  2 +-
 .../36_ai_kfp_orchestration.md                | 49 +++++++++++++++++++
 .../37_ai_tfx_orchestration.md                | 20 ++++++++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index 85e45b0234a3..2dccf141c071 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -1,7 +1,7 @@
 Prompt:
 How can I use Apache Beam in AI/ML pipeline?
 Response:
-You can use Apache Beam for data ingestion, validation, preprocessing, model validation, and model deployment and inference.
+You can use Apache Beam for data ingestion, validation, preprocessing, and inference.
 
 Apache Beam has a rich set of  [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation.
 
diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
index e69de29bb2d1..47439e2c4ed7 100644
--- a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
+++ b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
@@ -0,0 +1,49 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with KubeFlow?
+Response:
+Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
+To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. 
+A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
+
+ `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practises and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps.
+
+You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. 
+
+In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to:
+1. Create KFP components by specifying interfaces and containerizing the component implementation.
+2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components.
+3. Compile and run KFP pipeline by submitting it to a KFP client endpoint.
+
+Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline:
+```
+    kfp
+    ├── pipeline.py
+    ├── components
+    │   └── preprocess
+    │       ├── Dockerfile
+    │       ├── component.yaml
+    │       ├── requirements.txt
+    │       └── src
+    │           └── preprocess.py
+    └── requirements.txt
+```
+
+Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code.
+Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. 
+
+As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`. 
+
+```python
+
+Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
+
+client = kfp.Client()
+experiment = client.create_experiment("KFP orchestration example")
+run_result = client.run_pipeline(
+    experiment_id=experiment.id,
+    job_name="KFP orchestration job",
+    pipeline_package_path="pipeline.json",
+    params=run_arguments)
+```
+See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/kfp) for a complete example of orchestrating Apache Beam preprocessing pipeline with KubeFlow.
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index e69de29bb2d1..9bc1a868dcf5 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -0,0 +1,20 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+Response:
+Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
+To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. 
+A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
+
+TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself.
+
+In order to run an Apache Beam pipeline as part of a TFX pipeline you need to:
+1. Create TFX components by specifying interfaces and containerizing the component implementation.
+2. Create TFX pipeline by connecting components, and defining how inputs and outputs are passed between the components.
+3. Compile and run TFX pipeline in the target environment.
+
+TFX has a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for building ML pipelines. These components can be used as building blocks for creating custom components. TFX relies heavily on Apache Beam libraries to implement data-parallel pipelines in these components. You need to run components created with these libraries with one of the supported Apache Beam runners.
+
+To launch the pipeline, two configurations must be provided - orchestrator for the TFX pipeline and the pipeline options to run Apache Beam pipelines.
+
+The full TFX example code can again be found [here](The full TFX example code can again be found in the GitHub repository.).
\ No newline at end of file

From 9d32d2a8e58690e50ca61252d908bd4edd67fef7 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Wed, 14 Feb 2024 01:26:53 +0100
Subject: [PATCH 05/25] new files added

---
 .../34_ai_preprocess_data.md                  | 33 -------------------
 ..._explore_data.md => 38_ai_runinference.md} |  0
 .../39_ai_runinference_metrics.md             |  0
 .../40_ai_model_evaluation.md                 |  0
 4 files changed, 33 deletions(-)
 delete mode 100644 learning/prompts/documentation-lookup/34_ai_preprocess_data.md
 rename learning/prompts/documentation-lookup/{35_ai_explore_data.md => 38_ai_runinference.md} (100%)
 create mode 100644 learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
 create mode 100644 learning/prompts/documentation-lookup/40_ai_model_evaluation.md

diff --git a/learning/prompts/documentation-lookup/34_ai_preprocess_data.md b/learning/prompts/documentation-lookup/34_ai_preprocess_data.md
deleted file mode 100644
index c97c11016873..000000000000
--- a/learning/prompts/documentation-lookup/34_ai_preprocess_data.md
+++ /dev/null
@@ -1,33 +0,0 @@
-Prompt:
-How can I preprocess data for AI/ML pipeline with Apache Beam?
-Response:
-Apache Beam provides a rich set of data processing [transforms](https://beam.apache.org/documentation/programming-guide/#applying-transforms) that can be used to preprocess data for training and inference.
-To simplify your data preprocessing, Apache Beam provides a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) that wraps transforms in one class allowing you to ensure data consistency using the same steps for training and inference.
-
-MLTransform can do a full pass on the dataset allowing you to perform the followingcommon preprocessing steps:
-- normalization of input values using a minimum-maximum scaling
-- convert `floats` to `integers` using a bucketization based on the input data distribution
-- convert string to integers by generating a vocabulary over an entire dataset
-- calculate [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) weights for text data
-
-For a full list of supported `MLTransforms` and I/O requirements see [here](https://beam.apache.org/documentation/ml/preprocess-data/#data-processing-transforms-that-use-tft) .
-
-MLTransform can operate in read or write mode generating or consuming artifacts such as vocabulary or statistics. See an example of `write mode` below:
-```python
-  with beam.Pipeline() as p:
-    transformed_data = (
-        p
-        | beam.Create(data)
-        | MLTransform(write_artifact_location=artifact_location).with_transform(
-            <TRANSFORM_FUNCTION_NAME>)
-        | beam.Map(print))
-
-```
-For more examples of using `MLTransform` see [here](https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/).
-
-MLTransform also can be used to generate text embeddings to push data into vector stores or to be used as input for machine learning models. Supported text embedding transforms include `SentenceTransformerEmbeddings` for the `Hugging Face` [sentence-transformers](https://huggingface.co/sentence-transformers) and `VertexAITextEmbeddings` for the [Vertex AI text-embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).
-
-`MLTransform` is available in the Python SDK versions 2.53.0 and later. Only pipelines that use [default windows](https://beam.apache.org/documentation/programming-guide/#single-global-window) are supported. 
-
-
-
diff --git a/learning/prompts/documentation-lookup/35_ai_explore_data.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
similarity index 100%
rename from learning/prompts/documentation-lookup/35_ai_explore_data.md
rename to learning/prompts/documentation-lookup/38_ai_runinference.md
diff --git a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/learning/prompts/documentation-lookup/40_ai_model_evaluation.md b/learning/prompts/documentation-lookup/40_ai_model_evaluation.md
new file mode 100644
index 000000000000..e69de29bb2d1

From d31505a3fb70ed0a6eca19055bec5282637f280c Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sat, 17 Feb 2024 10:02:39 +0100
Subject: [PATCH 06/25] ml prompts added

---
 .../38_ai_runinference.md                     | 38 +++++++++++++++++++
 .../39_ai_runinference_metrics.md             | 36 ++++++++++++++++++
 ...del_evaluation.md => 40_ai_bigquery_ml.md} |  0
 .../41_ai_model_refresh.md                    |  3 ++
 .../42_ai_custom_inference.md                 |  0
 .../43_ai_llm_inference.md                    |  0
 6 files changed, 77 insertions(+)
 rename learning/prompts/documentation-lookup/{40_ai_model_evaluation.md => 40_ai_bigquery_ml.md} (100%)
 create mode 100644 learning/prompts/documentation-lookup/41_ai_model_refresh.md
 create mode 100644 learning/prompts/documentation-lookup/42_ai_custom_inference.md
 create mode 100644 learning/prompts/documentation-lookup/43_ai_llm_inference.md

diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index e69de29bb2d1..467311e5d9fe 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -0,0 +1,38 @@
+Prompt:
+How can I run inference on a trained model using AI?
+Response:
+Apache Beam lets you efficiently inference on local and remnote ML models in your pipelines with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions. 
+
+`RunInference API` has includes following features:
+- support of both batch and streaming inference
+- centralized model management fot efficient memory and bandwidth usage
+- support of multiple model frameworks and model hubs
+- automatic model refresh ensures latest model version is used
+- support of GPUs for model inference
+
+`RunInference API` supports variety of frameworks and model hubs iuncluding [Tensorflow](https://www.tensorflow.org/), [Pytorch](https://pytorch.org/), [Sklearn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). You can also use custom model frameworks by using a custom [model_handler](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
+
+To import models you need to configure the a `ModelHandler` object that wraps the underlying model. The `ModelHandler` allows you to set environment variables needed for inference.
+
+Following is an example importing a model handler to use in your pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) for end-to-end examples for supported model frameworks and model hubs.
+
+If you would like to run inference on a model that is not specifically supported, you need to create your own `ModelHandler` or `KeyedModelHandler` with logic to load your model and use it to run the inference. See here example of [custom model handler](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb).
+
+For patterns and best practises of running inference with Apache Beam, see [here](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns).
+
+For an example of using RunInference API in Java SDK see [here](https://github.com/apache/beam/tree/master/examples/multi-languages). Additionally see [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) for an example of a composite Python transform that uses the RunInference API along with preprocessing and postprocessing from a Beam Java SDK pipeline.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
index e69de29bb2d1..d079a86290a4 100644
--- a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
+++ b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
@@ -0,0 +1,36 @@
+Prompt:
+What metrics are available for monitoring the performance of ML model inference in Apache Beam?
+Response:
+Apache Beam lets you efficiently inference on ML models with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines).
+
+RunInference API provides has built-in support for monitoring the performance of ML model inference. Following is a list of metrics commonly used for inference benchmarking:
+
+- `model_byte_size`: size of the memory footprint of the model load and initialization
+- `load_model_latency_milli_secs`: time taken to load and initialize the model
+- `num_inferences`: total number of elements passed to run_inference()
+- `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples
+- `inference_request_batch_byte_size`: size of the batch in bytes
+- `inference_request_batch_size`: number of examples in the batch
+- `failed_batches_counter`: number of failed batches
+
+You can derive additoinal metrics from listed above as follows:
+`Total time taken for inference` =` num_inferences` x `inference_batch_latency_micro_secs_MEAN`
+
+Below is the simplistic example of how to use the `RunInference` API to inference on a LM model:
+
+```python
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "Create inputs" >> beam.Create(<INPUTS>>)
+        | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
+        | "Inference" >> RunInference(<MODEL_HANDLER>))
+```
+After running the pipeline, you can query the metrics using the following code:
+
+```python
+metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())
+```
+Metrics are also available in the Dataflow UI. For the whole source code of the example and instructions to run it in Google Dataflow, see [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/runinference_metrics/)
+
+See here for a [](https://github.com/akvelon/beam/blob/371576a3b17b940380192378848dd00c55d0cc19/sdks/python/apache_beam/ml/inference/base.py#L1228)
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup/40_ai_model_evaluation.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
similarity index 100%
rename from learning/prompts/documentation-lookup/40_ai_model_evaluation.md
rename to learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
diff --git a/learning/prompts/documentation-lookup/41_ai_model_refresh.md b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
new file mode 100644
index 000000000000..942db8b58c70
--- /dev/null
+++ b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
@@ -0,0 +1,3 @@
+Prompt:
+
+Response:
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/learning/prompts/documentation-lookup/43_ai_llm_inference.md b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
new file mode 100644
index 000000000000..e69de29bb2d1

From 31c90e43be6ec4fac12d24e0c5f207e8d6236f1c Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sat, 17 Feb 2024 18:59:29 +0100
Subject: [PATCH 07/25] bigqueryml prompt added

---
 .../documentation-lookup/40_ai_bigquery_ml.md | 51 +++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index e69de29bb2d1..575f99f233e0 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -0,0 +1,51 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) lets you create and run ML models by using `GoogleSQL` queries. You can integrate those models into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) and run inference on them using the `RunInference` API.
+
+For an example of training a simple `BQML` model please see [here](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start).
+
+After you have trained your model, you need to export it. Below is an example of BigQuery command to export a model to a [Google Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+In order to incorporate your `BQML` model into an Apache Beam pipeline using `tfx_bsl` it has to be saved in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. You have to download model to local directory to do local predictions.
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = tf.train.Example(...)
+
+model_path = <path to the directory where the model is stored>
+
+def extract_prediction(response):
+    # Extract the prediction from the response depending on the signature of the model
+
+with beam.Pipeline() as p:
+    res = (
+        p
+        | beam.Create([inputs])
+        | RunInference(
+            model_spec_pb2.InferenceSpecType(
+                saved_model_spec=model_spec_pb2.SavedModelSpec(
+                    model_path=model_path,
+                    signature_name=['serving_default'])))
+        | beam.ParDo(extract_prediction)
+    )
+```
+
+This example uses the `RunInference` `PTransform` from the `tfx_bsl` library, and we  point it to your local directory where the model is stored. 
+The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type `tensorflow_serving.apis.prediction_log_pb2.PredictionLog`. You can extract values from the output depending on the signature of the model. See [here](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) for the output format of `BQML` models.
+
+
+
+

From 104af50c39dbcdbafd9229c31de50d5d679127bd Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sat, 17 Feb 2024 19:30:57 +0100
Subject: [PATCH 08/25] fix whitespace

---
 .../documentation-lookup/36_ai_kfp_orchestration.md    | 10 +++++-----
 .../documentation-lookup/37_ai_tfx_orchestration.md    |  2 +-
 .../prompts/documentation-lookup/38_ai_runinference.md |  6 +++---
 .../documentation-lookup/39_ai_runinference_metrics.md |  2 +-
 .../prompts/documentation-lookup/40_ai_bigquery_ml.md  |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
index 47439e2c4ed7..8c65041d8e51 100644
--- a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
+++ b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
@@ -3,12 +3,12 @@ How can I orchestrate Apache Beam pipelines with KubeFlow?
 Response:
 Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
 To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
-You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. 
+You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference.
 A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 
- `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practises and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps.
+ `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps.
 
-You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. 
+You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`.
 
 In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to:
 1. Create KFP components by specifying interfaces and containerizing the component implementation.
@@ -30,9 +30,9 @@ Following is an example of a KFP pipeline that orchestrates an Apache Beam prepr
 ```
 
 Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code.
-Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. 
+Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry.
 
-As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`. 
+As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`.
 
 ```python
 
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index 9bc1a868dcf5..45706103b550 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -3,7 +3,7 @@ How can I orchestrate Apache Beam pipelines with TFX?
 Response:
 Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
 To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
-You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. 
+You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference.
 A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 
 TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself.
diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index 467311e5d9fe..8fabed41c039 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -1,7 +1,7 @@
 Prompt:
 How can I run inference on a trained model using AI?
 Response:
-Apache Beam lets you efficiently inference on local and remnote ML models in your pipelines with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions. 
+Apache Beam lets you efficiently inference on local and remote ML models in your pipelines with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.
 
 `RunInference API` has includes following features:
 - support of both batch and streaming inference
@@ -10,7 +10,7 @@ Apache Beam lets you efficiently inference on local and remnote ML models in you
 - automatic model refresh ensures latest model version is used
 - support of GPUs for model inference
 
-`RunInference API` supports variety of frameworks and model hubs iuncluding [Tensorflow](https://www.tensorflow.org/), [Pytorch](https://pytorch.org/), [Sklearn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). You can also use custom model frameworks by using a custom [model_handler](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
+`RunInference API` supports variety of frameworks and model hubs including [Tensorflow](https://www.tensorflow.org/), [Pytorch](https://pytorch.org/), [Sklearn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). You can also use custom model frameworks by using a custom [model_handler](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
 
 To import models you need to configure the a `ModelHandler` object that wraps the underlying model. The `ModelHandler` allows you to set environment variables needed for inference.
 
@@ -33,6 +33,6 @@ See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/ex
 
 If you would like to run inference on a model that is not specifically supported, you need to create your own `ModelHandler` or `KeyedModelHandler` with logic to load your model and use it to run the inference. See here example of [custom model handler](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb).
 
-For patterns and best practises of running inference with Apache Beam, see [here](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns).
+For patterns and best practices of running inference with Apache Beam, see [here](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns).
 
 For an example of using RunInference API in Java SDK see [here](https://github.com/apache/beam/tree/master/examples/multi-languages). Additionally see [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) for an example of a composite Python transform that uses the RunInference API along with preprocessing and postprocessing from a Beam Java SDK pipeline.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
index d079a86290a4..68b104d77955 100644
--- a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
+++ b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
@@ -13,7 +13,7 @@ RunInference API provides has built-in support for monitoring the performance of
 - `inference_request_batch_size`: number of examples in the batch
 - `failed_batches_counter`: number of failed batches
 
-You can derive additoinal metrics from listed above as follows:
+You can derive additional metrics from listed above as follows:
 `Total time taken for inference` =` num_inferences` x `inference_batch_latency_micro_secs_MEAN`
 
 Below is the simplistic example of how to use the `RunInference` API to inference on a LM model:
diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index 575f99f233e0..a6043298ca1c 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -43,7 +43,7 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the `RunInference` `PTransform` from the `tfx_bsl` library, and we  point it to your local directory where the model is stored. 
+This example uses the `RunInference` `PTransform` from the `tfx_bsl` library, and we  point it to your local directory where the model is stored.
 The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type `tensorflow_serving.apis.prediction_log_pb2.PredictionLog`. You can extract values from the output depending on the signature of the model. See [here](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) for the output format of `BQML` models.
 
 

From 2a7e626a7bec62ae3b0f9a66f30f378a96ce3c59 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Wed, 21 Feb 2024 18:28:14 +0100
Subject: [PATCH 09/25] add model refresh prompt

---
 .../41_ai_model_refresh.md                    | 39 ++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/learning/prompts/documentation-lookup/41_ai_model_refresh.md b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
index 942db8b58c70..3ab046cbaefa 100644
--- a/learning/prompts/documentation-lookup/41_ai_model_refresh.md
+++ b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
@@ -1,3 +1,40 @@
 Prompt:
-
+How can I make sure my Apache Beam pipeline uses the latest version of the ML model?
 Response:
+Production ML Workflow often involves updating the AI model with new data. You can use the `RunInference` API to make sure your Apache Beam pipeline uses the latest version of the ML model. This is achieved by using the [Side input](https://beam.apache.org/documentation/programming-guide/#side-inputs) concept of Apache Beam, which allows you to pass additional data to a `DoFn` that is not part of the main input.
+
+`RunInference` transform accepts optional input parameter `model_metadata_pcoll` side input `PCollection` of `ModelMetadata`  which is a `NamedTuple` containing `model_id` and `model_name` used to load model for inference and identify the model in the metric generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective [ModelHandler requirements](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
+
+If the main collection emits inputs before the `model_metadata_pcoll` side input is emitted, the main PCollection will be buffered until the `model_metadata_pcoll` side input is available.
+
+For more information on `ModelMetadata`, see [here](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata).
+
+Common approach to model update in production is to use a `FileWatchPattern` as side input.
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.utils import WatchFilePattern
+from apache_beam.ml.inference.base import RunInference
+
+tf_model_handler = ... # model handler for the model
+
+with beam.Pipeline() as pipeline:
+
+  file_pattern = '<path_to_model_file>'
+
+  side_input_pcoll = (
+    pipeline
+    | "FilePatternUpdates" >> WatchFilePattern(file_pattern=file_pattern))
+
+  main_input_pcoll = ... # main input PCollection
+
+  inference_pcoll = (
+    main_input_pcoll
+    | "RunInference" >> RunInference(
+    model_handler=model_handler,
+    model_metadata_pcoll=side_input_pcoll))
+```
+In the example above, `model_metadata_pcoll` expects a `PCollection` of `ModelMetadata` compatible with `AsSingleton`. Because the pipeline uses `WatchFilePattern` as side input, it will take care of windowing and wrapping the output into `ModelMetadata`.
+
+For more information see [here](https://beam.apache.org/documentation/ml/side-input-updates/).
+

From 4d352a4e9f4a37594e08c17bad75509bf0cf8409 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Thu, 22 Feb 2024 07:52:53 +0100
Subject: [PATCH 10/25] custom inference added

---
 .../42_ai_custom_inference.md                 | 46 +++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index e69de29bb2d1..619aed2046d1 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -0,0 +1,46 @@
+Prompt:
+How can I use remote inference calls in my Apache Beam pipeline?
+
+Response:
+The preferred way to run inference in an Apache Beam pipeline is to use the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. It enables you to run models as part of your pipeline or perform remote inference calls.
+
+You have to consider following factors while developing custom inference calls:
+- **API quotas**: heavy load might result in API quota limits being reached. You may use [PipelineOptions](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options) to set the maximum number of parallel API calls. Use direct_num_workers pipeline option for [Direct Runner](https://beam.apache.org/documentation/runners/direct/) or max_num_workers for [Google Cloud Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/). See [Beam capability matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for information about other runners.
+- **Error handling**: you need to handle errors in case of API call failures. You may use exponential backoff for retries or dead-letter queues for failed API calls.
+- **Monitoring**: use monitoring and performance metrics to track the performance of your inference calls and health of your pipeline.
+- **Batching**: you may use batching to send multiple inputs in a single API call for efficiency.
+
+You will need to create a `beam.DoFn` in the form of custom model handler in order to make external API calls with `RunInference` `Ptransform`:
+
+```python
+class CustomModelHandler(ModelHandler):
+  """DoFn that accepts a batch of inputs and sends that batch to the remote API for inference"""
+  
+  def load_model(self):
+    """Initiate the Custom remote API client."""
+    client = ... # Initialize the client
+    return client
+
+  def run_inference(self, batch, model, inference):
+
+    # Prepare a batch request for all inputs in the batch.
+    inputs = ... # process inputs from the batch
+    input_requests = ... # Prepare input requests for the model
+    batch_request = ... # Prepare batch request for the model
+
+    # Send the batch request to the remote endpoint.
+    responses = model.(request=batch_request).responses
+
+    return responses
+```
+Use this custom model handler in your pipeline as follows:
+```python
+
+with beam.Pipeline() as pipeline:
+  _ = (pipeline | "Create inputs" >> beam.Create(<read input data>)
+                | "Inference" >> RunInference(model_handler=CustomModelHandler())
+                | "Process outputs" >> beam.Map(<write output data>)
+  )
+```
+
+For a complete example of using the `RunInference API` for remote inference calls, see the [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb).

From e7d1c10bed6886b3dba72491125aa620e3836518 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Thu, 22 Feb 2024 08:33:19 +0100
Subject: [PATCH 11/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../33_ai_ml_pipelines.md                     | 17 +++++-----
 .../36_ai_kfp_orchestration.md                | 34 +++++++++++--------
 .../37_ai_tfx_orchestration.md                |  4 +--
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index 2dccf141c071..ab57b6da8417 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -1,20 +1,21 @@
 Prompt:
-How can I use Apache Beam in AI/ML pipeline?
+How can I use Apache Beam in my AI/ML pipeline?
+
 Response:
 You can use Apache Beam for data ingestion, validation, preprocessing, and inference.
 
-Apache Beam has a rich set of  [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation.
+Apache Beam offers a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation.
 
-Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference.
+For data preprocessing, Apache Beam provides the [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference.
 
-Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/).
+Additionally, Apache Beam allows integration with pre-trained models from [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [TensorFlow](https://www.tensorflow.org/).
 
-Machine learning inference can be performed in Apache Beam using the `RunInference API`.
-[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/).
+To execute machine learning inference tasks, Apache Beam provides the RunInference API.
+[`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) is a [`PTransform`](https://beam.apache.org/documentation/basics/#ptransform) that supports the usage of any custom models. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.
 
-`RunInference` works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM.
+The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) section of the Apache Beam documentation.
 
-For more information on building AI/ML pipelines see [here](https://beam.apache.org/documentation/ml/overview/).
+For more information on implementing AI/ML pipelines using Apache Beam, see the [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) section in the Apache Beam documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
index 8c65041d8e51..f6f745955f55 100644
--- a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
+++ b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
@@ -1,21 +1,25 @@
 Prompt:
 How can I orchestrate Apache Beam pipelines with KubeFlow?
+
 Response:
-Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
-To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
-You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference.
-A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
+Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
+
+To execute a pipeline, you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+
+You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
 
- `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps.
+However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 
-You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`.
+Kubeflow is an open-source ML platform tailored for MLOps, which comprises best practices and guiding principles to streamline the development and maintenance of ML workflows. Kubeflow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of DAGs. The DAGs orchestrate the scheduling and execution of pipeline steps and facilitate the transfer of execution parameters, metadata, and artifacts between steps.
 
-In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to:
+You can integrate an Apache Beam pipeline into a Kubeflow pipeline, where the Apache Beam pipeline DAG serves as a node within the Kubeflow pipeline DAG.
+
+To run an Apache Beam pipeline as a part of a Kubeflow Pipeline (KFP), you need to:
 1. Create KFP components by specifying interfaces and containerizing the component implementation.
-2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components.
-3. Compile and run KFP pipeline by submitting it to a KFP client endpoint.
+2. Create the KFP pipeline by connecting components and defining how inputs and outputs are exchanged between them.
+3. Compile and run the KFP pipeline by submitting it to a KFP client endpoint.
 
-Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline:
+Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing pipeline:
 ```
     kfp
     ├── pipeline.py
@@ -29,13 +33,12 @@ Following is an example of a KFP pipeline that orchestrates an Apache Beam prepr
     └── requirements.txt
 ```
 
-Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code.
-Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry.
+Apache Beam pipelines can be used as KFP components consisting of a YAML specification `component.yaml` and a Python source file `preprocess.py`. The YAML file defines the input and output arguments of the component, while the Python file contains the Apache Beam pipeline code.
+Since KFP provides input and output arguments as command-line arguments, an `ArgumentParser` is required. The component logic and requirements are containerized, and the container image is pushed to a container registry.
 
-As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`.
+As a final step, the KFP pipeline is compiled into a JSON file and submitted to a KFP client endpoint using `pipeline.py`.
 
 ```python
-
 Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
 
 client = kfp.Client()
@@ -46,4 +49,5 @@ run_result = client.run_pipeline(
     pipeline_package_path="pipeline.json",
     params=run_arguments)
 ```
-See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/kfp) for a complete example of orchestrating Apache Beam preprocessing pipeline with KubeFlow.
+
+For a complete example of orchestrating an Apache Beam preprocessing pipeline with Kubeflow, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/kfp).
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index 45706103b550..92e69aa9819b 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -3,8 +3,8 @@ How can I orchestrate Apache Beam pipelines with TFX?
 Response:
 Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
 To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
-You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference.
-A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
+You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
+However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 
 TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself.
 

From d90742a781e0fb3d8b2cfb156f34743434e3bfff Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sat, 24 Feb 2024 23:33:18 +0100
Subject: [PATCH 12/25] llm inference added

---
 .../38_ai_runinference.md                     |  2 +-
 .../42_ai_custom_inference.md                 |  2 +-
 .../43_ai_llm_inference.md                    | 54 +++++++++++++++++++
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index 8fabed41c039..429f773ffd08 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -14,7 +14,7 @@ Apache Beam lets you efficiently inference on local and remote ML models in your
 
 To import models you need to configure the a `ModelHandler` object that wraps the underlying model. The `ModelHandler` allows you to set environment variables needed for inference.
 
-Following is an example importing a model handler to use in your pipeline:
+Following is an example importing a `Pytorch` model handler to use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index 619aed2046d1..485fbf8f8ad3 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -1,5 +1,5 @@
 Prompt:
-How can I use remote inference calls in my Apache Beam pipeline?
+How can I use custom remote inference calls in my Apache Beam pipeline?
 
 Response:
 The preferred way to run inference in an Apache Beam pipeline is to use the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. It enables you to run models as part of your pipeline or perform remote inference calls.
diff --git a/learning/prompts/documentation-lookup/43_ai_llm_inference.md b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
index e69de29bb2d1..b3aef016bf4d 100644
--- a/learning/prompts/documentation-lookup/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
@@ -0,0 +1,54 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+Response:
+The preferred way to run inference in an Apache Beam pipeline is to use the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. It enables you to run models as part of your pipeline or perform remote inference calls.
+
+You can use `RunInference` `PTransform` with large models as long they fit into memory.
+
+Your typical workflow for running inference on LLMs in an Apache Beam pipeline is as follows:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into LLM model understandable tokens, usually using a tokenizer.
+3. Use RunInference to get the predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Following is an example of how to use the `RunInference` API with LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes that encode and decode the input and output data using a tokenizer, and `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to set environment variables needed load the model.
+
+Following is an example of how to configure a `Pytorch` model handler to use in your pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+model_handler = PytorchModelHandlerTensor(
+    state_dict_path=known_args.model_state_dict_path,
+    model_class=T5ForConditionalGeneration,
+    model_params={
+        "config": AutoConfig.from_pretrained(known_args.model_name)
+    },
+    device="cpu",
+    inference_fn=gen_fn)
+```
+Each particular model handler will have its own specific configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
+- `state_dict_path`: The path to the saved model state dictionary.
+- `model_class`: The class of the `Pytorch` model that defines model structure.
+- `model_params`: The dictionary of model parameters.
+- `device`: The device to run the model on (e.g. "cpu" or "gpu").
+- `inference_fn`: The function to run the inference during RunInference.
+
+See [here](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html) for supported popular frameworks and models.
+
+Please see [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py) for the full example of using the `RunInference API` with LLMs.
\ No newline at end of file

From e0c7e2d506d8c7d8dc14723ed9a4f75a1f6fc730 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 25 Feb 2024 07:12:13 +0100
Subject: [PATCH 13/25] whitespace fixed

---
 learning/prompts/documentation-lookup/42_ai_custom_inference.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index 485fbf8f8ad3..a321a0c392d1 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -15,7 +15,7 @@ You will need to create a `beam.DoFn` in the form of custom model handler in ord
 ```python
 class CustomModelHandler(ModelHandler):
   """DoFn that accepts a batch of inputs and sends that batch to the remote API for inference"""
-  
+
   def load_model(self):
     """Initiate the Custom remote API client."""
     client = ... # Initialize the client

From a851199202ca9895fc7aac9d1249ddb0075106d2 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 25 Feb 2024 07:50:11 +0100
Subject: [PATCH 14/25] code review comments

---
 learning/prompts/documentation-lookup/33_ai_ml_pipelines.md  | 5 ++++-
 .../prompts/documentation-lookup/36_ai_kfp_orchestration.md  | 4 ----
 .../prompts/documentation-lookup/37_ai_tfx_orchestration.md  | 3 +--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index ab57b6da8417..7c41038d4914 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -2,7 +2,10 @@ Prompt:
 How can I use Apache Beam in my AI/ML pipeline?
 
 Response:
-You can use Apache Beam for data ingestion, validation, preprocessing, and inference.
+Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
+To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference.
 
 Apache Beam offers a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation.
 
diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
index f6f745955f55..06383e604d29 100644
--- a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
+++ b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
@@ -2,10 +2,6 @@ Prompt:
 How can I orchestrate Apache Beam pipelines with KubeFlow?
 
 Response:
-Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
-
-To execute a pipeline, you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
-
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
 
 However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index 92e69aa9819b..a8e5b615f3d8 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -1,8 +1,7 @@
 Prompt:
 How can I orchestrate Apache Beam pipelines with TFX?
+
 Response:
-Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
-To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
 However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 

From 7b04897dc665ce79a589b25eab852f0775d369ab Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 4 Mar 2024 00:47:07 +0100
Subject: [PATCH 15/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../36_ai_kfp_orchestration.md                 |  8 +++++---
 .../37_ai_tfx_orchestration.md                 | 18 ++++++++++--------
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
index 06383e604d29..9284b99c4179 100644
--- a/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
+++ b/learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md
@@ -1,5 +1,5 @@
 Prompt:
-How can I orchestrate Apache Beam pipelines with KubeFlow?
+How can I orchestrate Apache Beam pipelines with Kubeflow?
 
 Response:
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
@@ -16,6 +16,7 @@ To run an Apache Beam pipeline as a part of a Kubeflow Pipeline (KFP), you need
 3. Compile and run the KFP pipeline by submitting it to a KFP client endpoint.
 
 Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing pipeline:
+
 ```
     kfp
     ├── pipeline.py
@@ -30,9 +31,10 @@ Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing
 ```
 
 Apache Beam pipelines can be used as KFP components consisting of a YAML specification `component.yaml` and a Python source file `preprocess.py`. The YAML file defines the input and output arguments of the component, while the Python file contains the Apache Beam pipeline code.
-Since KFP provides input and output arguments as command-line arguments, an `ArgumentParser` is required. The component logic and requirements are containerized, and the container image is pushed to a container registry.
 
-As a final step, the KFP pipeline is compiled into a JSON file and submitted to a KFP client endpoint using `pipeline.py`.
+Since KFP provides the input and output arguments as command-line arguments, you need to use the `ArgumentParser` class. You containerize the logic and prerequisites of the pipeline components, and then push the resulting container image to a container registry.
+
+In the final stage, the `compiler` method transforms the KFP pipeline into a JSON file, and the `pipeline.py` script submits it to a KFP client endpoint.
 
 ```python
 Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index a8e5b615f3d8..48fd7379b5a1 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -1,19 +1,21 @@
 Prompt:
 How can I orchestrate Apache Beam pipelines with TFX?
 
+
 Response:
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
-However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
 
-TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself.
+However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).
+
+TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Apache Airflow](https://airflow.apache.org/), and Apache Beam itself.
 
-In order to run an Apache Beam pipeline as part of a TFX pipeline you need to:
+To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps:
 1. Create TFX components by specifying interfaces and containerizing the component implementation.
-2. Create TFX pipeline by connecting components, and defining how inputs and outputs are passed between the components.
-3. Compile and run TFX pipeline in the target environment.
+2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
+3. Compile and execute the TFX pipeline in the target environment.
 
-TFX has a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for building ML pipelines. These components can be used as building blocks for creating custom components. TFX relies heavily on Apache Beam libraries to implement data-parallel pipelines in these components. You need to run components created with these libraries with one of the supported Apache Beam runners.
+TFX provides a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners.
 
-To launch the pipeline, two configurations must be provided - orchestrator for the TFX pipeline and the pipeline options to run Apache Beam pipelines.
+To launch the pipeline, you must provide two configurations: the orchestrator for the TFX pipeline and the pipeline options necessary for running Apache Beam pipelines.
 
-The full TFX example code can again be found [here](The full TFX example code can again be found in the GitHub repository.).
\ No newline at end of file
+You can find the complete TFX example code in the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/tfx).
\ No newline at end of file

From c815cf60a2db0845472aea94b27b33dc033776d9 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 4 Mar 2024 00:48:45 +0100
Subject: [PATCH 16/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../38_ai_runinference.md                     | 31 ++++++++++---------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index 429f773ffd08..302da39c7fb9 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -1,18 +1,19 @@
 Prompt:
-How can I run inference on a trained model using AI?
+How can I run inference on a trained ML model using Apache Beam?
+
 Response:
-Apache Beam lets you efficiently inference on local and remote ML models in your pipelines with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.
+Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. The `RunInference` transform performs inference on a [`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections) of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions.
 
-`RunInference API` has includes following features:
-- support of both batch and streaming inference
-- centralized model management fot efficient memory and bandwidth usage
-- support of multiple model frameworks and model hubs
-- automatic model refresh ensures latest model version is used
-- support of GPUs for model inference
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
 
-`RunInference API` supports variety of frameworks and model hubs including [Tensorflow](https://www.tensorflow.org/), [Pytorch](https://pytorch.org/), [Sklearn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). You can also use custom model frameworks by using a custom [model_handler](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
+The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks.
 
-To import models you need to configure the a `ModelHandler` object that wraps the underlying model. The `ModelHandler` allows you to set environment variables needed for inference.
+To import a model into your Apache Beam pipeline, you'll need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference.
 
 Following is an example importing a `Pytorch` model handler to use in your pipeline:
 
@@ -21,7 +22,7 @@ from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
 from apache_beam.ml.inference.base import RunInference
 
   model_handler = PytorchModelHandlerTensor(
-    # model handler setup
+    # Model handler setup
   )
 
 with pipeline as p:
@@ -29,10 +30,10 @@ with pipeline as p:
                     | 'RunInference' >> RunInference(model_handler)
 ```
 
-See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) for end-to-end examples for supported model frameworks and model hubs.
+For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
 
-If you would like to run inference on a model that is not specifically supported, you need to create your own `ModelHandler` or `KeyedModelHandler` with logic to load your model and use it to run the inference. See here example of [custom model handler](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb).
+If you need to run inference on a model that isn't explicitly supported, you can [create your own `ModelHandler` or `KeyedModelHandler`](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models with custom logic] to load and use your model. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.
 
-For patterns and best practices of running inference with Apache Beam, see [here](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns).
+For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the [RunInference Patterns](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns) section in the official documentation.
 
-For an example of using RunInference API in Java SDK see [here](https://github.com/apache/beam/tree/master/examples/multi-languages). Additionally see [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) for an example of a composite Python transform that uses the RunInference API along with preprocessing and postprocessing from a Beam Java SDK pipeline.
\ No newline at end of file
+For an example of using the RunInference API in the Java SDK, see the [example multi-language pipelines](https://github.com/apache/beam/tree/master/examples/multi-language) in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) section in the official documentation.
\ No newline at end of file

From 30e50c2d6f11a786ae89ce87c0d23425c57db725 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 4 Mar 2024 00:51:28 +0100
Subject: [PATCH 17/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../39_ai_runinference_metrics.md             | 30 ++++++++++---------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
index 68b104d77955..1fd7bf941780 100644
--- a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
+++ b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
@@ -1,22 +1,23 @@
 Prompt:
 What metrics are available for monitoring the performance of ML model inference in Apache Beam?
+
 Response:
-Apache Beam lets you efficiently inference on ML models with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines).
+Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework.
 
-RunInference API provides has built-in support for monitoring the performance of ML model inference. Following is a list of metrics commonly used for inference benchmarking:
+The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking:
 
-- `model_byte_size`: size of the memory footprint of the model load and initialization
-- `load_model_latency_milli_secs`: time taken to load and initialize the model
-- `num_inferences`: total number of elements passed to run_inference()
-- `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples
-- `inference_request_batch_byte_size`: size of the batch in bytes
-- `inference_request_batch_size`: number of examples in the batch
-- `failed_batches_counter`: number of failed batches
+* `model_byte_size`: size of the memory footprint of the model load and initialization.
+* `load_model_latency_milli_secs`: time taken to load and initialize the model.
+* `num_inferences`: total number of elements passed to `run_inference()`.
+* `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples.
+* `inference_request_batch_byte_size`: size of the batch in bytes.
+* `inference_request_batch_size`: number of examples in the batch.
+* `failed_batches_counter`: number of failed batches.
 
-You can derive additional metrics from listed above as follows:
-`Total time taken for inference` =` num_inferences` x `inference_batch_latency_micro_secs_MEAN`
+You can derive additional metrics from the ones listed above. Example:
+Total time taken for inference = `num_inferences` x `inference_batch_latency_micro_secs_MEAN`
 
-Below is the simplistic example of how to use the `RunInference` API to inference on a LM model:
+Here is a simplified example of how to use the RunInference API to perform inference on a language model (LM):
 
 ```python
   with beam.Pipeline(options=pipeline_options) as pipeline:
@@ -31,6 +32,7 @@ After running the pipeline, you can query the metrics using the following code:
 ```python
 metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())
 ```
-Metrics are also available in the Dataflow UI. For the whole source code of the example and instructions to run it in Google Dataflow, see [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/runinference_metrics/)
 
-See here for a [](https://github.com/akvelon/beam/blob/371576a3b17b940380192378848dd00c55d0cc19/sdks/python/apache_beam/ml/inference/base.py#L1228)
\ No newline at end of file
+Metrics are also available in the [Google Cloud Dataflow](https://cloud.google.com/dataflow) UI. For the complete source code of the example and instructions to run it in Dataflow, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/runinference_metrics/).
+
+For a sample implementation of a metrics collector that tracks ML-related performance and memory usage, see the [base Python inference example](https://github.com/akvelon/beam/blob/371576a3b17b940380192378848dd00c55d0cc19/sdks/python/apache_beam/ml/inference/base.py#L1228) in the Apache Beam GitHub repository.
\ No newline at end of file

From 73f925d69d1fcd6a5409607fb2aec200b5387f01 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Mon, 4 Mar 2024 00:54:55 +0100
Subject: [PATCH 18/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../documentation-lookup/40_ai_bigquery_ml.md | 12 +++----
 .../41_ai_model_refresh.md                    | 16 +++++----
 .../42_ai_custom_inference.md                 | 23 +++++++------
 .../43_ai_llm_inference.md                    | 34 ++++++++++---------
 4 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index a6043298ca1c..a8ef84586ea4 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -2,17 +2,18 @@ Prompt:
 How can I integrate BigQuery ML with Apache Beam?
 
 Response:
-[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) lets you create and run ML models by using `GoogleSQL` queries. You can integrate those models into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) and run inference on them using the `RunInference` API.
+[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) allows you to develop and execute ML models using [GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries. You can seamlessly integrate these models into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) and perform inference on them using the RunInference API.
 
-For an example of training a simple `BQML` model please see [here](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start).
+For an example of training a basic BigQuery ML model, refer to the [BigQuery documentation](?).
 
-After you have trained your model, you need to export it. Below is an example of BigQuery command to export a model to a [Google Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket:
+Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a [Google Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket:
 
 ```
 bq extract -m <model name> gs://<cloud storage path>
 ```
 
-In order to incorporate your `BQML` model into an Apache Beam pipeline using `tfx_bsl` it has to be saved in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. You have to download model to local directory to do local predictions.
+To incorporate your BigQuery ML model into an Apache Beam pipeline using `tfx_bsl`, it must be saved in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. You will need to download the model to your local directory to perform local predictions:
+
 ```python
 import apache_beam
 import tensorflow as tf
@@ -43,8 +44,7 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the `RunInference` `PTransform` from the `tfx_bsl` library, and we  point it to your local directory where the model is stored.
-The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type `tensorflow_serving.apis.prediction_log_pb2.PredictionLog`. You can extract values from the output depending on the signature of the model. See [here](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) for the output format of `BQML` models.
+This example uses the [`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform from the `tfx_bsl` library,  directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the [Prediction output format](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) section in the BigQuery documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup/41_ai_model_refresh.md b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
index 3ab046cbaefa..c41d8a01628b 100644
--- a/learning/prompts/documentation-lookup/41_ai_model_refresh.md
+++ b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
@@ -1,15 +1,16 @@
 Prompt:
 How can I make sure my Apache Beam pipeline uses the latest version of the ML model?
+
 Response:
-Production ML Workflow often involves updating the AI model with new data. You can use the `RunInference` API to make sure your Apache Beam pipeline uses the latest version of the ML model. This is achieved by using the [Side input](https://beam.apache.org/documentation/programming-guide/#side-inputs) concept of Apache Beam, which allows you to pass additional data to a `DoFn` that is not part of the main input.
+Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's [side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) feature. Side inputs are additional inputs that can be provided to a [‘ParDo’](https://beam.apache.org/documentation/programming-guide/#pardo) transform, in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection).
 
-`RunInference` transform accepts optional input parameter `model_metadata_pcoll` side input `PCollection` of `ModelMetadata`  which is a `NamedTuple` containing `model_id` and `model_name` used to load model for inference and identify the model in the metric generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective [ModelHandler requirements](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models).
+The `RunInference` transform accepts an optional input parameter `model_metadata_pcoll`, which is a side input `PCollection` emitting `ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` and `model_name`, used to load the model for inference and identify it in the metrics generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective [`ModelHandler` requirements](https://beam.apache.org/documentation/ml/about-ml/#modify-a-python-pipeline-to-use-an-ml-model).
 
-If the main collection emits inputs before the `model_metadata_pcoll` side input is emitted, the main PCollection will be buffered until the `model_metadata_pcoll` side input is available.
+If the main collection emits inputs before the `model_metadata_pcoll` side input is available, the main `PCollection` will be buffered until the `model_metadata_pcoll` side input is emitted.
 
-For more information on `ModelMetadata`, see [here](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata).
+For more information on `ModelMetadata`, refer to the [Apache Beam Python SDK documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata).
 
-Common approach to model update in production is to use a `FileWatchPattern` as side input.
+A common approach to model updates in production is to use a `FileWatchPattern` as a side input:
 
 ```python
 import apache_beam as beam
@@ -34,7 +35,8 @@ with beam.Pipeline() as pipeline:
     model_handler=model_handler,
     model_metadata_pcoll=side_input_pcoll))
 ```
-In the example above, `model_metadata_pcoll` expects a `PCollection` of `ModelMetadata` compatible with `AsSingleton`. Because the pipeline uses `WatchFilePattern` as side input, it will take care of windowing and wrapping the output into `ModelMetadata`.
 
-For more information see [here](https://beam.apache.org/documentation/ml/side-input-updates/).
+In the provided example, the `model_metadata_pcoll` parameter expects a `PCollection` of `ModelMetadata` compatible with the `AsSingleton` marker. Given that the pipeline employs the `WatchFilePattern` class as a side input, it automatically manages windowing and encapsulates the output into `ModelMetadata`.
+
+For more information, refer to the [Use `WatchFilePattern` to auto-update ML models in RunInference](https://beam.apache.org/documentation/ml/side-input-updates/) section in the Apache Beam documentation.
 
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index a321a0c392d1..1f4139571617 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -2,15 +2,15 @@ Prompt:
 How can I use custom remote inference calls in my Apache Beam pipeline?
 
 Response:
-The preferred way to run inference in an Apache Beam pipeline is to use the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. It enables you to run models as part of your pipeline or perform remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You have to consider following factors while developing custom inference calls:
-- **API quotas**: heavy load might result in API quota limits being reached. You may use [PipelineOptions](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options) to set the maximum number of parallel API calls. Use direct_num_workers pipeline option for [Direct Runner](https://beam.apache.org/documentation/runners/direct/) or max_num_workers for [Google Cloud Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/). See [Beam capability matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for information about other runners.
-- **Error handling**: you need to handle errors in case of API call failures. You may use exponential backoff for retries or dead-letter queues for failed API calls.
-- **Monitoring**: use monitoring and performance metrics to track the performance of your inference calls and health of your pipeline.
-- **Batching**: you may use batching to send multiple inputs in a single API call for efficiency.
+When developing custom inference calls, consider the following factors:
+* **API quotas**: heavy loads may lead to reaching API quota limits. You can use [`PipelineOptions`](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options) to specify the maximum number of parallel API calls. Use `direct_num_workers` for the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) or `max_num_workers` for the [Google Cloud Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/). Refer to the [Beam Capability Matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for information about other runners.
+* **Error handling**: it is essential to handle errors in case of API call failures. Consider implementing exponential backoff for retries or utilizing dead-letter queues for failed API calls.
+* **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline.
+* **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency.
 
-You will need to create a `beam.DoFn` in the form of custom model handler in order to make external API calls with `RunInference` `Ptransform`:
+To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler:
 
 ```python
 class CustomModelHandler(ModelHandler):
@@ -24,7 +24,7 @@ class CustomModelHandler(ModelHandler):
   def run_inference(self, batch, model, inference):
 
     # Prepare a batch request for all inputs in the batch.
-    inputs = ... # process inputs from the batch
+    inputs = ... # Process inputs from the batch
     input_requests = ... # Prepare input requests for the model
     batch_request = ... # Prepare batch request for the model
 
@@ -33,9 +33,10 @@ class CustomModelHandler(ModelHandler):
 
     return responses
 ```
-Use this custom model handler in your pipeline as follows:
-```python
 
+Integrate this custom model handler into your pipeline as demonstrated in the following example:
+
+```python
 with beam.Pipeline() as pipeline:
   _ = (pipeline | "Create inputs" >> beam.Create(<read input data>)
                 | "Inference" >> RunInference(model_handler=CustomModelHandler())
@@ -43,4 +44,4 @@ with beam.Pipeline() as pipeline:
   )
 ```
 
-For a complete example of using the `RunInference API` for remote inference calls, see the [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb).
+For a comprehensive example of using the RunInference API for remote inference calls, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb).
diff --git a/learning/prompts/documentation-lookup/43_ai_llm_inference.md b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
index b3aef016bf4d..e70883544ba0 100644
--- a/learning/prompts/documentation-lookup/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
@@ -1,17 +1,18 @@
 Prompt:
 Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
 Response:
-The preferred way to run inference in an Apache Beam pipeline is to use the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. It enables you to run models as part of your pipeline or perform remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You can use `RunInference` `PTransform` with large models as long they fit into memory.
+You can use the `RunInference` transform with large models as long they fit into memory.
 
-Your typical workflow for running inference on LLMs in an Apache Beam pipeline is as follows:
+The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
 1. Read the input text data from a source such as a file or a Pub/Sub topic.
-2. Encode the text into LLM model understandable tokens, usually using a tokenizer.
-3. Use RunInference to get the predictions from the model.
+2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
 4. Decode the predictions into human-readable text.
 
-Following is an example of how to use the `RunInference` API with LLMs in an Apache Beam pipeline:
+Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline:
 
 ```python
 import apache_beam as beam
@@ -27,9 +28,9 @@ with beam.Pipeline(options=pipeline_options) as pipeline:
     )
 ```
 
-In this example, `Preprocess` and `Postprocess` are `DoFn` classes that encode and decode the input and output data using a tokenizer, and `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to set environment variables needed load the model.
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model.
 
-Following is an example of how to configure a `Pytorch` model handler to use in your pipeline:
+Here is an example of how to configure a Pytorch model handler for use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
@@ -42,13 +43,14 @@ model_handler = PytorchModelHandlerTensor(
     device="cpu",
     inference_fn=gen_fn)
 ```
-Each particular model handler will have its own specific configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
-- `state_dict_path`: The path to the saved model state dictionary.
-- `model_class`: The class of the `Pytorch` model that defines model structure.
-- `model_params`: The dictionary of model parameters.
-- `device`: The device to run the model on (e.g. "cpu" or "gpu").
-- `inference_fn`: The function to run the inference during RunInference.
 
-See [here](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html) for supported popular frameworks and models.
+Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
+* `state_dict_path`: the path to the saved model state dictionary.
+* `model_class`: the class of the Pytorch model that defines model structure.
+* `model_params`: the dictionary of model parameters.
+* `device`: the device to run the model on (e.g. "cpu" or "gpu").
+* `inference_fn`: the function to run the inference during RunInference.
+
+For information on supported popular frameworks and models, refer to the reference documentation for the [`apache_beam.ml.inference` package](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html).
 
-Please see [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py) for the full example of using the `RunInference API` with LLMs.
\ No newline at end of file
+You can find the comprehensive example of using the RunInference API with LLMs in the [Apache Beam GitHub repository](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py).
\ No newline at end of file

From 74b8d2d5611ca7f43e0269d1c0ede188ee4c2a28 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 3 Mar 2024 04:00:10 +0100
Subject: [PATCH 19/25] fixed broken link

---
 learning/prompts/documentation-lookup/40_ai_bigquery_ml.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index a8ef84586ea4..0b1ff6ed34a8 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -4,7 +4,7 @@ How can I integrate BigQuery ML with Apache Beam?
 Response:
 [BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) allows you to develop and execute ML models using [GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries. You can seamlessly integrate these models into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) and perform inference on them using the RunInference API.
 
-For an example of training a basic BigQuery ML model, refer to the [BigQuery documentation](?).
+For an example of training a basic BigQuery ML model, refer to the [BigQuery documentation](https://cloud.google.com/bigquery/docs/bqml-introduction).
 
 Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a [Google Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket:
 

From ea99a4982e0067f27ab268a8720b53245889509c Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 3 Mar 2024 06:17:10 +0100
Subject: [PATCH 20/25] nolinks docs added

---
 .../33_ai_ml_pipelines.md                     | 25 +++++++++
 .../36_ai_kfp_orchestration.md                | 51 +++++++++++++++++
 .../37_ai_tfx_orchestration.md                | 21 +++++++
 .../38_ai_runinference.md                     | 39 +++++++++++++
 .../39_ai_runinference_metrics.md             | 38 +++++++++++++
 .../40_ai_bigquery_ml.md                      | 51 +++++++++++++++++
 .../41_ai_model_refresh.md                    | 42 ++++++++++++++
 .../42_ai_custom_inference.md                 | 47 ++++++++++++++++
 .../43_ai_llm_inference.md                    | 56 +++++++++++++++++++
 9 files changed, 370 insertions(+)
 create mode 100644 learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/36_ai_kfp_orchestration.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
 create mode 100644 learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md

diff --git a/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
new file mode 100644
index 000000000000..745957a048df
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`.
+
+To execute machine learning inference tasks, Apache Beam provides the RunInference API.
+`RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with `spaCy`, refer to examples in the Apache Beam GitHub repository.
+
+The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the RunInference section of the Apache Beam documentation.
+
+For more information on implementing AI/ML pipelines using Apache Beam, see the `Get started with AI/ML pipelines` section in the Apache Beam documentation.
+
+
+
diff --git a/learning/prompts/documentation-lookup-nolinks/36_ai_kfp_orchestration.md b/learning/prompts/documentation-lookup-nolinks/36_ai_kfp_orchestration.md
new file mode 100644
index 000000000000..df0419a67342
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/36_ai_kfp_orchestration.md
@@ -0,0 +1,51 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with Kubeflow?
+
+Response:
+You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
+
+However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include Kubeflow, TFX, and Apache Airflow.
+
+Kubeflow is an open-source ML platform tailored for MLOps, which comprises best practices and guiding principles to streamline the development and maintenance of ML workflows. Kubeflow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of DAGs. The DAGs orchestrate the scheduling and execution of pipeline steps and facilitate the transfer of execution parameters, metadata, and artifacts between steps.
+
+You can integrate an Apache Beam pipeline into a Kubeflow pipeline, where the Apache Beam pipeline DAG serves as a node within the Kubeflow pipeline DAG.
+
+To run an Apache Beam pipeline as a part of a Kubeflow Pipeline (KFP), you need to:
+1. Create KFP components by specifying interfaces and containerizing the component implementation.
+2. Create the KFP pipeline by connecting components and defining how inputs and outputs are exchanged between them.
+3. Compile and run the KFP pipeline by submitting it to a KFP client endpoint.
+
+Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing pipeline:
+
+```
+    kfp
+    ├── pipeline.py
+    ├── components
+    │   └── preprocess
+    │       ├── Dockerfile
+    │       ├── component.yaml
+    │       ├── requirements.txt
+    │       └── src
+    │           └── preprocess.py
+    └── requirements.txt
+```
+
+Apache Beam pipelines can be used as KFP components consisting of a YAML specification `component.yaml` and a Python source file `preprocess.py`. The YAML file defines the input and output arguments of the component, while the Python file contains the Apache Beam pipeline code.
+
+Since KFP provides the input and output arguments as command-line arguments, you need to use the `ArgumentParser` class. You containerize the logic and prerequisites of the pipeline components, and then push the resulting container image to a container registry.
+
+In the final stage, the `compiler` method transforms the KFP pipeline into a JSON file, and the `pipeline.py` script submits it to a KFP client endpoint.
+
+```python
+Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
+
+client = kfp.Client()
+experiment = client.create_experiment("KFP orchestration example")
+run_result = client.run_pipeline(
+    experiment_id=experiment.id,
+    job_name="KFP orchestration job",
+    pipeline_package_path="pipeline.json",
+    params=run_arguments)
+```
+
+For a complete example of orchestrating an Apache Beam preprocessing pipeline with Kubeflow, refer to the Apache Beam GitHub repository.
diff --git a/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
new file mode 100644
index 000000000000..318eafa5f07a
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
@@ -0,0 +1,21 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+
+
+Response:
+You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
+
+However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include Kubeflow, TFX, and Apache Airflow.
+
+TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include Kubeflow Pipelines, Apache Airflow, and Apache Beam itself.
+
+To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps:
+1. Create TFX components by specifying interfaces and containerizing the component implementation.
+2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
+3. Compile and execute the TFX pipeline in the target environment.
+
+TFX provides a rich set of standard components for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners.
+
+To launch the pipeline, you must provide two configurations: the orchestrator for the TFX pipeline and the pipeline options necessary for running Apache Beam pipelines.
+
+You can find the complete TFX example code in the Apache Beam GitHub repository.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
new file mode 100644
index 000000000000..2cc50df2ed74
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # Model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository.
+
+If you need to run inference on a model that isn't explicitly supported, you can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to load and use your model. For an example of running inference on a custom model loaded with spaCy, refer to the examples in the Apache Beam GitHub repository.
+
+For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the RunInference Patterns section in the official documentation.
+
+For an example of using the RunInference API in the Java SDK, see the example multi-language pipelines in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the Apache Beam official documentation.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
new file mode 100644
index 000000000000..97df41266d7e
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
@@ -0,0 +1,38 @@
+Prompt:
+What metrics are available for monitoring the performance of ML model inference in Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework.
+
+The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking:
+
+* `model_byte_size`: size of the memory footprint of the model load and initialization.
+* `load_model_latency_milli_secs`: time taken to load and initialize the model.
+* `num_inferences`: total number of elements passed to `run_inference()`.
+* `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples.
+* `inference_request_batch_byte_size`: size of the batch in bytes.
+* `inference_request_batch_size`: number of examples in the batch.
+* `failed_batches_counter`: number of failed batches.
+
+You can derive additional metrics from the ones listed above. Example:
+Total time taken for inference = `num_inferences` x `inference_batch_latency_micro_secs_MEAN`
+
+Here is a simplified example of how to use the RunInference API to perform inference on a language model (LM):
+
+```python
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "Create inputs" >> beam.Create(<INPUTS>>)
+        | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
+        | "Inference" >> RunInference(<MODEL_HANDLER>))
+```
+After running the pipeline, you can query the metrics using the following code:
+
+```python
+metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())
+```
+
+Metrics are also available in the Google Cloud Dataflow UI. For the complete source code of the example and instructions to run it in Dataflow, refer to the Apache Beam GitHub repository.
+
+For a sample implementation of a metrics collector that tracks ML-related performance and memory usage, see the base Python inference example in the Apache Beam GitHub repository.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
new file mode 100644
index 000000000000..f967c82809b4
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
@@ -0,0 +1,51 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+BigQuery ML allows you to develop and execute ML models using GoogleSQL queries. You can seamlessly integrate these models into your Apache Beam pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on them using the RunInference API.
+
+For an example of training a basic BigQuery ML model, refer to the BigQuery documentation.
+
+Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a Google Cloud Storage bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+To incorporate your BigQuery ML model into an Apache Beam pipeline using `tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need to download the model to your local directory to perform local predictions:
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = tf.train.Example(...)
+
+model_path = <path to the directory where the model is stored>
+
+def extract_prediction(response):
+    # Extract the prediction from the response depending on the signature of the model
+
+with beam.Pipeline() as p:
+    res = (
+        p
+        | beam.Create([inputs])
+        | RunInference(
+            model_spec_pb2.InferenceSpecType(
+                saved_model_spec=model_spec_pb2.SavedModelSpec(
+                    model_path=model_path,
+                    signature_name=['serving_default'])))
+        | beam.ParDo(extract_prediction)
+    )
+```
+
+This example uses the `RunInference` transform from the `tfx_bsl` library,  directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the Prediction output format section in the BigQuery documentation.
+
+
+
+
diff --git a/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md b/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
new file mode 100644
index 000000000000..cd381f04b4a9
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
@@ -0,0 +1,42 @@
+Prompt:
+How can I make sure my Apache Beam pipeline uses the latest version of the ML model?
+
+Response:
+Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ transform, in addition to the main input `PCollection`.
+
+The `RunInference` transform accepts an optional input parameter `model_metadata_pcoll`, which is a side input `PCollection` emitting `ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` and `model_name`, used to load the model for inference and identify it in the metrics generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective `ModelHandler` requirements.
+
+If the main collection emits inputs before the `model_metadata_pcoll` side input is available, the main `PCollection` will be buffered until the `model_metadata_pcoll` side input is emitted.
+
+For more information on `ModelMetadata`, refer to the Apache Beam Python SDK documentation.
+
+A common approach to model updates in production is to use a `FileWatchPattern` as a side input:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.utils import WatchFilePattern
+from apache_beam.ml.inference.base import RunInference
+
+tf_model_handler = ... # model handler for the model
+
+with beam.Pipeline() as pipeline:
+
+  file_pattern = '<path_to_model_file>'
+
+  side_input_pcoll = (
+    pipeline
+    | "FilePatternUpdates" >> WatchFilePattern(file_pattern=file_pattern))
+
+  main_input_pcoll = ... # main input PCollection
+
+  inference_pcoll = (
+    main_input_pcoll
+    | "RunInference" >> RunInference(
+    model_handler=model_handler,
+    model_metadata_pcoll=side_input_pcoll))
+```
+
+In the provided example, the `model_metadata_pcoll` parameter expects a `PCollection` of `ModelMetadata` compatible with the `AsSingleton` marker. Given that the pipeline employs the `WatchFilePattern` class as a side input, it automatically manages windowing and encapsulates the output into `ModelMetadata`.
+
+For more information, refer to the `WatchFilePattern` section in the Apache Beam documentation.
+
diff --git a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
new file mode 100644
index 000000000000..1038c8d444a0
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the `RunInference API` provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+
+When developing custom inference calls, consider the following factors:
+* **API quotas**: heavy loads may lead to reaching API quota limits. You can use `PipelineOptions` to specify the maximum number of parallel API calls. Use `direct_num_workers` for the Direct Runner or `max_num_workers` for the Google Cloud Dataflow Runner. Refer to the Beam Capability Matrix for information about other runners.
+* **Error handling**: it is essential to handle errors in case of API call failures. Consider implementing exponential backoff for retries or utilizing dead-letter queues for failed API calls.
+* **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline.
+* **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency.
+
+To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler:
+
+```python
+class CustomModelHandler(ModelHandler):
+  """DoFn that accepts a batch of inputs and sends that batch to the remote API for inference"""
+
+  def load_model(self):
+    """Initiate the Custom remote API client."""
+    client = ... # Initialize the client
+    return client
+
+  def run_inference(self, batch, model, inference):
+
+    # Prepare a batch request for all inputs in the batch.
+    inputs = ... # Process inputs from the batch
+    input_requests = ... # Prepare input requests for the model
+    batch_request = ... # Prepare batch request for the model
+
+    # Send the batch request to the remote endpoint.
+    responses = model.(request=batch_request).responses
+
+    return responses
+```
+
+Integrate this custom model handler into your pipeline as demonstrated in the following example:
+
+```python
+with beam.Pipeline() as pipeline:
+  _ = (pipeline | "Create inputs" >> beam.Create(<read input data>)
+                | "Inference" >> RunInference(model_handler=CustomModelHandler())
+                | "Process outputs" >> beam.Map(<write output data>)
+  )
+```
+
+For a comprehensive example of using the RunInference API for remote inference calls, refer to the Apache Beam GitHub repository.
diff --git a/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
new file mode 100644
index 000000000000..bd2532513b04
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit into memory.
+
+The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model.
+
+Here is an example of how to configure a Pytorch model handler for use in your pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+model_handler = PytorchModelHandlerTensor(
+    state_dict_path=known_args.model_state_dict_path,
+    model_class=T5ForConditionalGeneration,
+    model_params={
+        "config": AutoConfig.from_pretrained(known_args.model_name)
+    },
+    device="cpu",
+    inference_fn=gen_fn)
+```
+
+Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
+* `state_dict_path`: the path to the saved model state dictionary.
+* `model_class`: the class of the Pytorch model that defines model structure.
+* `model_params`: the dictionary of model parameters.
+* `device`: the device to run the model on (e.g. "cpu" or "gpu").
+* `inference_fn`: the function to run the inference during RunInference.
+
+For information on supported popular frameworks and models, refer to the reference documentation for the `apache_beam.ml.inference` package.
+
+You can find the comprehensive example of using the RunInference API with LLMs in the Apache Beam GitHub repository.
\ No newline at end of file

From 61f392fe2313ec299cfa54994aa0d113d608756e Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Thu, 7 Mar 2024 00:37:30 +0100
Subject: [PATCH 21/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../33_ai_ml_pipelines.md                           | 13 +++++--------
 .../37_ai_tfx_orchestration.md                      |  3 +--
 .../38_ai_runinference.md                           | 10 ++++------
 .../39_ai_runinference_metrics.md                   |  2 +-
 .../40_ai_bigquery_ml.md                            |  3 +--
 .../41_ai_model_refresh.md                          |  4 ++--
 .../42_ai_custom_inference.md                       |  4 ++--
 .../43_ai_llm_inference.md                          |  8 ++++----
 .../documentation-lookup/40_ai_bigquery_ml.md       |  1 -
 9 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
index 745957a048df..30d62c0faac0 100644
--- a/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md
@@ -2,9 +2,7 @@ Prompt:
 How can I use Apache Beam in my AI/ML pipeline?
 
 Response:
-Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages.
-
-To run a pipeline you need to deploy it to one of the supported Beam runners.
+Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the supported languages. To execute a pipeline, you need to deploy it to one of the supported Beam runners.
 
 You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference.
 
@@ -12,14 +10,13 @@ Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for
 
 For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference.
 
-Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`.
+Additionally, Apache Beam allows integration with pre-trained models from PyTorch, Scikit-learn, and TensorFlow.
 
-To execute machine learning inference tasks, Apache Beam provides the RunInference API.
-`RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with `spaCy`, refer to examples in the Apache Beam GitHub repository.
+To execute machine learning inference tasks, Apache Beam provides the RunInference API. `RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with spaCy, refer to the Apache Beam GitHub repository.
 
-The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the RunInference section of the Apache Beam documentation.
+The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the 'RunInference' section of the Apache Beam documentation.
 
-For more information on implementing AI/ML pipelines using Apache Beam, see the `Get started with AI/ML pipelines` section in the Apache Beam documentation.
+For more information on implementing AI/ML pipelines using Apache Beam, see the 'Get started with AI/ML pipelines' section in the Apache Beam documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
index 318eafa5f07a..861f011a3cd9 100644
--- a/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md
@@ -1,7 +1,6 @@
 Prompt:
 How can I orchestrate Apache Beam pipelines with TFX?
 
-
 Response:
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
 
@@ -11,7 +10,7 @@ TFX is an end-to-end platform designed for deploying production ML pipelines. TF
 
 To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps:
 1. Create TFX components by specifying interfaces and containerizing the component implementation.
-2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
+2. Create the TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
 3. Compile and execute the TFX pipeline in the target environment.
 
 TFX provides a rich set of standard components for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners.
diff --git a/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
index 2cc50df2ed74..fa2727042828 100644
--- a/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
@@ -13,9 +13,9 @@ Key features of the RunInference API include:
 
 The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks.
 
-To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference.
+To import a model into your Apache Beam pipeline, you will need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference.
 
-Following is an example importing a `Pytorch` model handler to use in your pipeline:
+Here is an example of importing a PyTorch model handler for use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
@@ -30,10 +30,8 @@ with pipeline as p:
                     | 'RunInference' >> RunInference(model_handler)
 ```
 
-For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository.
-
 If you need to run inference on a model that isn't explicitly supported, you can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to load and use your model. For an example of running inference on a custom model loaded with spaCy, refer to the examples in the Apache Beam GitHub repository.
 
-For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the RunInference Patterns section in the official documentation.
+For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the 'RunInference Patterns' section in the official documentation.
 
-For an example of using the RunInference API in the Java SDK, see the example multi-language pipelines in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the Apache Beam official documentation.
\ No newline at end of file
+For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, including examples of using the RunInference API in the Java SDK, see the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the Apache Beam official documentation.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
index 97df41266d7e..003f921b79d7 100644
--- a/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
+++ b/learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md
@@ -5,7 +5,6 @@ Response:
 Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework.
 
 The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking:
-
 * `model_byte_size`: size of the memory footprint of the model load and initialization.
 * `load_model_latency_milli_secs`: time taken to load and initialize the model.
 * `num_inferences`: total number of elements passed to `run_inference()`.
@@ -27,6 +26,7 @@ Here is a simplified example of how to use the RunInference API to perform infer
         | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
         | "Inference" >> RunInference(<MODEL_HANDLER>))
 ```
+
 After running the pipeline, you can query the metrics using the following code:
 
 ```python
diff --git a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
index f967c82809b4..1d9b8c79d29f 100644
--- a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
@@ -23,7 +23,6 @@ from tfx_bsl.beam import run_inference
 from tfx_bsl.public.beam import RunInference
 from tfx_bsl.public.proto import model_spec_pb2
 
-
 inputs = tf.train.Example(...)
 
 model_path = <path to the directory where the model is stored>
@@ -44,7 +43,7 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the `RunInference` transform from the `tfx_bsl` library,  directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the Prediction output format section in the BigQuery documentation.
+This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md b/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
index cd381f04b4a9..7f7890fa91e8 100644
--- a/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
+++ b/learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md
@@ -2,7 +2,7 @@ Prompt:
 How can I make sure my Apache Beam pipeline uses the latest version of the ML model?
 
 Response:
-Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ transform, in addition to the main input `PCollection`.
+Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a `ParDo` transform, in addition to the main input `PCollection`.
 
 The `RunInference` transform accepts an optional input parameter `model_metadata_pcoll`, which is a side input `PCollection` emitting `ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` and `model_name`, used to load the model for inference and identify it in the metrics generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective `ModelHandler` requirements.
 
@@ -38,5 +38,5 @@ with beam.Pipeline() as pipeline:
 
 In the provided example, the `model_metadata_pcoll` parameter expects a `PCollection` of `ModelMetadata` compatible with the `AsSingleton` marker. Given that the pipeline employs the `WatchFilePattern` class as a side input, it automatically manages windowing and encapsulates the output into `ModelMetadata`.
 
-For more information, refer to the `WatchFilePattern` section in the Apache Beam documentation.
+For more information, refer to the section 'Use `WatchFilePattern` to auto-update ML models in RunInference' in the Apache Beam documentation.
 
diff --git a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
index 1038c8d444a0..167d6c29d744 100644
--- a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
@@ -2,7 +2,7 @@ Prompt:
 How can I use custom remote inference calls in my Apache Beam pipeline?
 
 Response:
-The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the `RunInference API` provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
 When developing custom inference calls, consider the following factors:
 * **API quotas**: heavy loads may lead to reaching API quota limits. You can use `PipelineOptions` to specify the maximum number of parallel API calls. Use `direct_num_workers` for the Direct Runner or `max_num_workers` for the Google Cloud Dataflow Runner. Refer to the Beam Capability Matrix for information about other runners.
@@ -10,7 +10,7 @@ When developing custom inference calls, consider the following factors:
 * **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline.
 * **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency.
 
-To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler:
+To execute external API calls with the `RunInference` transform, you will need to create a `beam.DoFn` in the form of a custom model handler:
 
 ```python
 class CustomModelHandler(ModelHandler):
diff --git a/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
index bd2532513b04..939b3b1a90c3 100644
--- a/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
@@ -2,9 +2,9 @@ Prompt:
 Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
 
 Response:
-The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You can use the `RunInference` transform with large models as long they fit into memory.
+You can use the `RunInference` transform with large models as long as they fit into memory.
 
 The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
 1. Read the input text data from a source such as a file or a Pub/Sub topic.
@@ -30,7 +30,7 @@ with beam.Pipeline(options=pipeline_options) as pipeline:
 
 In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model.
 
-Here is an example of how to configure a Pytorch model handler for use in your pipeline:
+Here is an example of how to configure a PyTorch model handler for use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
@@ -46,7 +46,7 @@ model_handler = PytorchModelHandlerTensor(
 
 Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
 * `state_dict_path`: the path to the saved model state dictionary.
-* `model_class`: the class of the Pytorch model that defines model structure.
+* `model_class`: the class of the PyTorch model that defines model structure.
 * `model_params`: the dictionary of model parameters.
 * `device`: the device to run the model on (e.g. "cpu" or "gpu").
 * `inference_fn`: the function to run the inference during RunInference.
diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index 0b1ff6ed34a8..b9ce223ca744 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -23,7 +23,6 @@ from tfx_bsl.beam import run_inference
 from tfx_bsl.public.beam import RunInference
 from tfx_bsl.public.proto import model_spec_pb2
 
-
 inputs = tf.train.Example(...)
 
 model_path = <path to the directory where the model is stored>

From 883ca5d3529b42e3bf3d6a39a54c269c24fc76bc Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Thu, 7 Mar 2024 13:08:22 +0100
Subject: [PATCH 22/25] Apply suggestions from code review

Co-authored-by: Daria Bezkorovaina <99457894+dariabezkorovaina@users.noreply.github.com>
---
 .../prompts/documentation-lookup/33_ai_ml_pipelines.md    | 3 +--
 .../documentation-lookup/37_ai_tfx_orchestration.md       | 3 +--
 .../prompts/documentation-lookup/38_ai_runinference.md    | 4 ++--
 .../documentation-lookup/39_ai_runinference_metrics.md    | 1 +
 .../prompts/documentation-lookup/41_ai_model_refresh.md   | 2 +-
 .../documentation-lookup/42_ai_custom_inference.md        | 4 ++--
 .../prompts/documentation-lookup/43_ai_llm_inference.md   | 8 ++++----
 7 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index 7c41038d4914..9b04aaa8c65a 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -2,8 +2,7 @@ Prompt:
 How can I use Apache Beam in my AI/ML pipeline?
 
 Response:
-Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/).
-To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). To execute a pipeline, you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/).
 
 You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference.
 
diff --git a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
index 48fd7379b5a1..5b3b9ce183ef 100644
--- a/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
+++ b/learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
@@ -1,7 +1,6 @@
 Prompt:
 How can I orchestrate Apache Beam pipelines with TFX?
 
-
 Response:
 You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
 
@@ -11,7 +10,7 @@ TFX is an end-to-end platform designed for deploying production ML pipelines. TF
 
 To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps:
 1. Create TFX components by specifying interfaces and containerizing the component implementation.
-2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
+2. Create the TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
 3. Compile and execute the TFX pipeline in the target environment.
 
 TFX provides a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners.
diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index 302da39c7fb9..b00f9874a7b7 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -13,9 +13,9 @@ Key features of the RunInference API include:
 
 The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks.
 
-To import a model into your Apache Beam pipeline, you'll need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference.
+To import a model into your Apache Beam pipeline, you will need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference.
 
-Following is an example importing a `Pytorch` model handler to use in your pipeline:
+Here is an example of importing a PyTorch model handler for use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
diff --git a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
index 1fd7bf941780..2c19af86f3e4 100644
--- a/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
+++ b/learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
@@ -27,6 +27,7 @@ Here is a simplified example of how to use the RunInference API to perform infer
         | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
         | "Inference" >> RunInference(<MODEL_HANDLER>))
 ```
+
 After running the pipeline, you can query the metrics using the following code:
 
 ```python
diff --git a/learning/prompts/documentation-lookup/41_ai_model_refresh.md b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
index c41d8a01628b..f23c30af600e 100644
--- a/learning/prompts/documentation-lookup/41_ai_model_refresh.md
+++ b/learning/prompts/documentation-lookup/41_ai_model_refresh.md
@@ -2,7 +2,7 @@ Prompt:
 How can I make sure my Apache Beam pipeline uses the latest version of the ML model?
 
 Response:
-Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's [side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) feature. Side inputs are additional inputs that can be provided to a [‘ParDo’](https://beam.apache.org/documentation/programming-guide/#pardo) transform, in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection).
+Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's [side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) feature. Side inputs are additional inputs that can be provided to a [`ParDo`](https://beam.apache.org/documentation/programming-guide/#pardo) transform, in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection).
 
 The `RunInference` transform accepts an optional input parameter `model_metadata_pcoll`, which is a side input `PCollection` emitting `ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` and `model_name`, used to load the model for inference and identify it in the metrics generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective [`ModelHandler` requirements](https://beam.apache.org/documentation/ml/about-ml/#modify-a-python-pipeline-to-use-an-ml-model).
 
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index 1f4139571617..158584fabab2 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -2,7 +2,7 @@ Prompt:
 How can I use custom remote inference calls in my Apache Beam pipeline?
 
 Response:
-The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
 When developing custom inference calls, consider the following factors:
 * **API quotas**: heavy loads may lead to reaching API quota limits. You can use [`PipelineOptions`](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options) to specify the maximum number of parallel API calls. Use `direct_num_workers` for the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) or `max_num_workers` for the [Google Cloud Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/). Refer to the [Beam Capability Matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for information about other runners.
@@ -10,7 +10,7 @@ When developing custom inference calls, consider the following factors:
 * **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline.
 * **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency.
 
-To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler:
+To execute external API calls with the `RunInference` transform, you will need to create a `beam.DoFn` in the form of a custom model handler:
 
 ```python
 class CustomModelHandler(ModelHandler):
diff --git a/learning/prompts/documentation-lookup/43_ai_llm_inference.md b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
index e70883544ba0..aa858655f50f 100644
--- a/learning/prompts/documentation-lookup/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
@@ -2,9 +2,9 @@ Prompt:
 Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
 
 Response:
-The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
+The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You can use the `RunInference` transform with large models as long they fit into memory.
+You can use the `RunInference` transform with large models as long as they fit into memory.
 
 The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
 1. Read the input text data from a source such as a file or a Pub/Sub topic.
@@ -30,7 +30,7 @@ with beam.Pipeline(options=pipeline_options) as pipeline:
 
 In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model.
 
-Here is an example of how to configure a Pytorch model handler for use in your pipeline:
+Here is an example of how to configure a PyTorch model handler for use in your pipeline:
 
 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
@@ -46,7 +46,7 @@ model_handler = PytorchModelHandlerTensor(
 
 Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
 * `state_dict_path`: the path to the saved model state dictionary.
-* `model_class`: the class of the Pytorch model that defines model structure.
+* `model_class`: the class of the PyTorch model that defines model structure.
 * `model_params`: the dictionary of model parameters.
 * `device`: the device to run the model on (e.g. "cpu" or "gpu").
 * `inference_fn`: the function to run the inference during RunInference.

From 4f85a5a168518d43552cfad9a81d7bb9e6a5c154 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 10 Mar 2024 11:34:07 +0100
Subject: [PATCH 23/25] code review comments

---
 .../38_ai_runinference.md                          |  2 +-
 .../40_ai_bigquery_ml.md                           |  4 +++-
 .../42_ai_custom_inference.md                      |  7 +++++--
 .../43_ai_llm_inference.md                         | 10 +++++++---
 .../documentation-lookup/33_ai_ml_pipelines.md     | 14 +++-----------
 .../documentation-lookup/38_ai_runinference.md     |  2 +-
 .../documentation-lookup/40_ai_bigquery_ml.md      |  4 +++-
 .../documentation-lookup/42_ai_custom_inference.md |  7 +++++--
 .../documentation-lookup/43_ai_llm_inference.md    | 10 +++++++---
 9 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
index fa2727042828..992410da43ec 100644
--- a/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md
@@ -27,7 +27,7 @@ from apache_beam.ml.inference.base import RunInference
 
 with pipeline as p:
     predictions = p |  'Read' >> beam.ReadFromSource('a_source')
-                    | 'RunInference' >> RunInference(model_handler)
+                    |  'RunInference' >> RunInference(model_handler)
 ```
 
 If you need to run inference on a model that isn't explicitly supported, you can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to load and use your model. For an example of running inference on a custom model loaded with spaCy, refer to the examples in the Apache Beam GitHub repository.
diff --git a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
index 1d9b8c79d29f..12fc53ff3838 100644
--- a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
@@ -43,7 +43,9 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation.
+This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. 
+
+Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
index 167d6c29d744..b8bf99779a6c 100644
--- a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
@@ -16,20 +16,23 @@ To execute external API calls with the `RunInference` transform, you will need t
 class CustomModelHandler(ModelHandler):
   """DoFn that accepts a batch of inputs and sends that batch to the remote API for inference"""
 
-  def load_model(self):
+  def initialize_client(self):
     """Initiate the Custom remote API client."""
     client = ... # Initialize the client
     return client
 
   def run_inference(self, batch, model, inference):
 
+    # Initialize the client.
+    client = self.initialize_client()
+
     # Prepare a batch request for all inputs in the batch.
     inputs = ... # Process inputs from the batch
     input_requests = ... # Prepare input requests for the model
     batch_request = ... # Prepare batch request for the model
 
     # Send the batch request to the remote endpoint.
-    responses = model.(request=batch_request).responses
+    responses = client.(request=batch_request).responses
 
     return responses
 ```
diff --git a/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
index 939b3b1a90c3..1a46bd44f709 100644
--- a/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md
@@ -1,10 +1,12 @@
 Prompt:
-Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+How can I use Apache Beam to run inference on Large Language Models (LLMs)?
 
 Response:
 The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You can use the `RunInference` transform with large models as long as they fit into memory.
+You can use the `RunInference` transform with large models as long as they fit on your hardware. In addition, RunInference has several mechanisms for reducing memory usage. For example, `RunInference` by default will load at most a single copy of each model per process (rather than one per thread).
+
+Many Beam runners run multiple Beam processes per machine at once. Loading large models like LLMs multiple times might cause problems because of the negative impact on memory footprint. For memory-intensive models, `RunInference` provides a mechanism for more intelligently sharing memory across multiple processes to reduce the overall memory footprint. To enable this mode, set the parameter `large_model` to True in model configuration and Beam will take care of the memory management.
 
 The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
 1. Read the input text data from a source such as a file or a Pub/Sub topic.
@@ -41,7 +43,8 @@ model_handler = PytorchModelHandlerTensor(
         "config": AutoConfig.from_pretrained(known_args.model_name)
     },
     device="cpu",
-    inference_fn=gen_fn)
+    inference_fn=gen_fn,
+    large_model=True)
 ```
 
 Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
@@ -50,6 +53,7 @@ Each specific model handler has its own configuration parameters. For example, t
 * `model_params`: the dictionary of model parameters.
 * `device`: the device to run the model on (e.g. "cpu" or "gpu").
 * `inference_fn`: the function to run the inference during RunInference.
+* `large_model`: whether to use the memory minimization mode for large models.
 
 For information on supported popular frameworks and models, refer to the reference documentation for the `apache_beam.ml.inference` package.
 
diff --git a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
index 9b04aaa8c65a..086b3eea65e5 100644
--- a/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
+++ b/learning/prompts/documentation-lookup/33_ai_ml_pipelines.md
@@ -8,16 +8,8 @@ You can use Apache Beam for various tasks within your AI/ML pipeline, including
 
 Apache Beam offers a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation.
 
-For data preprocessing, Apache Beam provides the [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference.
-
-Additionally, Apache Beam allows integration with pre-trained models from [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [TensorFlow](https://www.tensorflow.org/).
-
-To execute machine learning inference tasks, Apache Beam provides the RunInference API.
-[`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) is a [`PTransform`](https://beam.apache.org/documentation/basics/#ptransform) that supports the usage of any custom models. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.
-
-The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) section of the Apache Beam documentation.
-
-For more information on implementing AI/ML pipelines using Apache Beam, see the [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) section in the Apache Beam documentation.
-
+For data preprocessing, Apache Beam provides the [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. Additionally, Apache Beam allows integration with pre-trained models from [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [TensorFlow](https://www.tensorflow.org/).
 
+To execute machine learning inference tasks, Apache Beam provides the RunInference API. [`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) is a [`PTransform`](https://beam.apache.org/documentation/basics/#ptransform) that supports the usage of any custom models. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.
 
+The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) section of the Apache Beam documentation. For more information on implementing AI/ML pipelines using Apache Beam, see the [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) section in the Apache Beam documentation.
\ No newline at end of file
diff --git a/learning/prompts/documentation-lookup/38_ai_runinference.md b/learning/prompts/documentation-lookup/38_ai_runinference.md
index b00f9874a7b7..a494f74941c5 100644
--- a/learning/prompts/documentation-lookup/38_ai_runinference.md
+++ b/learning/prompts/documentation-lookup/38_ai_runinference.md
@@ -27,7 +27,7 @@ from apache_beam.ml.inference.base import RunInference
 
 with pipeline as p:
     predictions = p |  'Read' >> beam.ReadFromSource('a_source')
-                    | 'RunInference' >> RunInference(model_handler)
+                    |  'RunInference' >> RunInference(model_handler)
 ```
 
 For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
diff --git a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
index b9ce223ca744..3d7ba82de9a9 100644
--- a/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup/40_ai_bigquery_ml.md
@@ -43,7 +43,9 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the [`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform from the `tfx_bsl` library,  directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the [Prediction output format](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) section in the BigQuery documentation.
+This example uses the [`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs.
+
+Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the [Prediction output format](https://cloud.google.com/bigquery/docs/exporting-models#prediction_output_format) section in the BigQuery documentation.
 
 
 
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index 158584fabab2..7570c0f4149f 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -16,20 +16,23 @@ To execute external API calls with the `RunInference` transform, you will need t
 class CustomModelHandler(ModelHandler):
   """DoFn that accepts a batch of inputs and sends that batch to the remote API for inference"""
 
-  def load_model(self):
+  def initialize_client(self):
     """Initiate the Custom remote API client."""
     client = ... # Initialize the client
     return client
 
   def run_inference(self, batch, model, inference):
 
+    # Initialize the client.
+    client = self.initialize_client()
+
     # Prepare a batch request for all inputs in the batch.
     inputs = ... # Process inputs from the batch
     input_requests = ... # Prepare input requests for the model
     batch_request = ... # Prepare batch request for the model
 
     # Send the batch request to the remote endpoint.
-    responses = model.(request=batch_request).responses
+    responses = client.(request=batch_request).responses
 
     return responses
 ```
diff --git a/learning/prompts/documentation-lookup/43_ai_llm_inference.md b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
index aa858655f50f..da21cbc827a8 100644
--- a/learning/prompts/documentation-lookup/43_ai_llm_inference.md
+++ b/learning/prompts/documentation-lookup/43_ai_llm_inference.md
@@ -1,10 +1,12 @@
 Prompt:
-Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+How can I use Apache Beam to run inference on Large Language Models (LLMs)?
 
 Response:
 The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls.
 
-You can use the `RunInference` transform with large models as long as they fit into memory.
+You can use the `RunInference` transform with large models as long as they fit on your hardware. In addition, RunInference has several mechanisms for reducing memory usage. For example, `RunInference` by default will load at most a single copy of each model per process (rather than one per thread).
+
+Many Beam runners run multiple Beam processes per machine at once. Loading large models like LLMs multiple times might cause problems because of the negative impact on memory footprint. For memory-intensive models, `RunInference` provides a mechanism for more intelligently sharing memory across multiple processes to reduce the overall memory footprint. To enable this mode, set the parameter `large_model` to True in model configuration and Beam will take care of the memory management.
 
 The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps:
 1. Read the input text data from a source such as a file or a Pub/Sub topic.
@@ -41,7 +43,8 @@ model_handler = PytorchModelHandlerTensor(
         "config": AutoConfig.from_pretrained(known_args.model_name)
     },
     device="cpu",
-    inference_fn=gen_fn)
+    inference_fn=gen_fn,
+    large_model=True)
 ```
 
 Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters:
@@ -50,6 +53,7 @@ Each specific model handler has its own configuration parameters. For example, t
 * `model_params`: the dictionary of model parameters.
 * `device`: the device to run the model on (e.g. "cpu" or "gpu").
 * `inference_fn`: the function to run the inference during RunInference.
+* `large_model`: whether to use the memory minimization mode for large models.
 
 For information on supported popular frameworks and models, refer to the reference documentation for the [`apache_beam.ml.inference` package](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html).
 

From 1337f47111d1a055e2733ef45a1cc8d3704cb1a6 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Sun, 10 Mar 2024 11:52:41 +0100
Subject: [PATCH 24/25] whitespace fix

---
 .../documentation-lookup-nolinks/40_ai_bigquery_ml.md     | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
index 12fc53ff3838..e7c6d019ba3a 100644
--- a/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
+++ b/learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md
@@ -43,10 +43,6 @@ with beam.Pipeline() as p:
     )
 ```
 
-This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. 
-
-Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation.
-
-
-
+This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs.
 
+Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation.
\ No newline at end of file

From 95bd796557f85374c535b2c6a40edce65e8d4179 Mon Sep 17 00:00:00 2001
From: Oleh Borysevych <oleg.borisevich@akvelon.com>
Date: Tue, 12 Mar 2024 11:39:45 +0100
Subject: [PATCH 25/25] Apply suggestions from code review

Co-authored-by: Danny McCormick <dannymccormick@google.com>
---
 .../documentation-lookup-nolinks/42_ai_custom_inference.md      | 2 +-
 learning/prompts/documentation-lookup/42_ai_custom_inference.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
index b8bf99779a6c..5a622b3465f1 100644
--- a/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md
@@ -32,7 +32,7 @@ class CustomModelHandler(ModelHandler):
     batch_request = ... # Prepare batch request for the model
 
     # Send the batch request to the remote endpoint.
-    responses = client.(request=batch_request).responses
+    responses = client.request(request=batch_request).responses
 
     return responses
 ```
diff --git a/learning/prompts/documentation-lookup/42_ai_custom_inference.md b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
index 7570c0f4149f..d4cea37cce0b 100644
--- a/learning/prompts/documentation-lookup/42_ai_custom_inference.md
+++ b/learning/prompts/documentation-lookup/42_ai_custom_inference.md
@@ -32,7 +32,7 @@ class CustomModelHandler(ModelHandler):
     batch_request = ... # Prepare batch request for the model
 
     # Send the batch request to the remote endpoint.
-    responses = client.(request=batch_request).responses
+    responses = client.request(request=batch_request).responses
 
     return responses
 ```