Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Daria Bezkorovaina <[email protected]>
  • Loading branch information
Oleh Borysevych and dariabezkorovaina authored Mar 3, 2024
1 parent c815cf6 commit 30e50c2
Showing 1 changed file with 16 additions and 14 deletions.
30 changes: 16 additions & 14 deletions learning/prompts/documentation-lookup/39_ai_runinference_metrics.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
Prompt:
What metrics are available for monitoring the performance of ML model inference in Apache Beam?

Response:
Apache Beam lets you efficiently inference on ML models with the help of `RunInference API` which is supported in Python SDK starting from Apache Beam 2.40.0 and in Java SDK version 2.41.0 through Apache Beam’s [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines).
Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework.

RunInference API provides has built-in support for monitoring the performance of ML model inference. Following is a list of metrics commonly used for inference benchmarking:
The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking:

- `model_byte_size`: size of the memory footprint of the model load and initialization
- `load_model_latency_milli_secs`: time taken to load and initialize the model
- `num_inferences`: total number of elements passed to run_inference()
- `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples
- `inference_request_batch_byte_size`: size of the batch in bytes
- `inference_request_batch_size`: number of examples in the batch
- `failed_batches_counter`: number of failed batches
* `model_byte_size`: size of the memory footprint of the model load and initialization.
* `load_model_latency_milli_secs`: time taken to load and initialize the model.
* `num_inferences`: total number of elements passed to `run_inference()`.
* `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples.
* `inference_request_batch_byte_size`: size of the batch in bytes.
* `inference_request_batch_size`: number of examples in the batch.
* `failed_batches_counter`: number of failed batches.

You can derive additional metrics from listed above as follows:
`Total time taken for inference` =` num_inferences` x `inference_batch_latency_micro_secs_MEAN`
You can derive additional metrics from the ones listed above. Example:
Total time taken for inference = `num_inferences` x `inference_batch_latency_micro_secs_MEAN`

Below is the simplistic example of how to use the `RunInference` API to inference on a LM model:
Here is a simplified example of how to use the RunInference API to perform inference on a language model (LM):

```python
with beam.Pipeline(options=pipeline_options) as pipeline:
Expand All @@ -31,6 +32,7 @@ After running the pipeline, you can query the metrics using the following code:
```python
metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())
```
Metrics are also available in the Dataflow UI. For the whole source code of the example and instructions to run it in Google Dataflow, see [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/runinference_metrics/)

See here for a [](https://github.com/akvelon/beam/blob/371576a3b17b940380192378848dd00c55d0cc19/sdks/python/apache_beam/ml/inference/base.py#L1228)
Metrics are also available in the [Google Cloud Dataflow](https://cloud.google.com/dataflow) UI. For the complete source code of the example and instructions to run it in Dataflow, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/runinference_metrics/).

For a sample implementation of a metrics collector that tracks ML-related performance and memory usage, see the [base Python inference example](https://github.com/akvelon/beam/blob/371576a3b17b940380192378848dd00c55d0cc19/sdks/python/apache_beam/ml/inference/base.py#L1228) in the Apache Beam GitHub repository.

0 comments on commit 30e50c2

Please sign in to comment.