Skip to content

Commit

Permalink
doc(katib): update push-based metrics collector. (#3844)
Browse files Browse the repository at this point in the history
* doc(katib): update push-based metrics collector.

Signed-off-by: Electronic-Waste <[email protected]>

* doc(katib): update example and description for Push MC.

Signed-off-by: Electronic-Waste <[email protected]>

---------

Signed-off-by: Electronic-Waste <[email protected]>
  • Loading branch information
Electronic-Waste authored Dec 6, 2024
1 parent 6099399 commit 99f4212
Showing 1 changed file with 56 additions and 9 deletions.
65 changes: 56 additions & 9 deletions content/en/docs/components/katib/user-guides/metrics-collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,23 @@ weight = 40

This guide describes how Katib metrics collector works.

## Metrics Collector
## Overview

There are two ways to collect metrics:

1. Pull-based: collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
the main container in the Kubernetes Pod.

2. Push-based: users push the metrics directly to Katib DB in the training scripts.

In the `metricsCollectorSpec` section of the Experiment YAML configuration file, you can
define how Katib should collect the metrics from each Trial, such as the accuracy and loss metrics.

Your training code can record the metrics into `StdOut` or into arbitrary output files. Katib
collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
the main container in the Kubernetes Pod.
## Pull-based Metrics Collector

To define the metrics collector for your Experiment:
Your training code can record the metrics into `StdOut` or into arbitrary output files.

To define the pull-based metrics collector for your Experiment:

1. Specify the collector type in the `.collector.kind` field.
Katib's metrics collector supports the following collector types:
Expand All @@ -29,7 +36,7 @@ To define the metrics collector for your Experiment:
metrics must be line-separated by `epoch` or `step` as follows, and the key for timestamp must
be `timestamp`:

```
```json
{"epoch": 0, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:51"}
{"epoch": 1, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:52"}
{"epoch": 2, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:53"}
Expand All @@ -51,9 +58,6 @@ To define the metrics collector for your Experiment:
in the `.collector.customCollector` field. Check the
[custom metrics collector example](https://github.com/kubeflow/katib/blob/ea46a7f2b73b2d316b6b7619f99eb440ede1909b/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L14-L36).

- `None`: Specify this value if you don't need to use Katib's metrics collector. For example,
your training code may handle the persistent storage of its own metrics.
2. Write code in your training container to print or save to the file metrics in the format
specified in the `.source.filter.metricsFormat` field. The default metrics format value is:

Expand All @@ -79,3 +83,46 @@ To define the metrics collector for your Experiment:
recall=0.55
precision=.5
```

## Push-based Metrics Collector

Your training code needs to call [`report_metrics()`](https://github.com/kubeflow/katib/blob/e251a07cb9491e2d892db306d925dddf51cb0930/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics.
The `report_metrics()` function works by parsing the metrics in `metrics` field into a gRPC request, automatically adding the current timestamp for users, and sending the request to Katib DB Manager.

But before that, `kubeflow-katib` package should be installed in your training container.

To define the push-based metrics collector for your Experiment, you have two options:

- YAML File

1. Specify the collector type `Push` in the `.collector.kind` field.

2. Write code in your training container to call `report_metrics()` to report metrics.

- [`tune`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L166) function

Use tune function and specify the `metrics_collector_config` field. You can reference to the following example:

```
import kubeflow.katib as katib
def objective(parameters):
import time
import kubeflow.katib as katib
time.sleep(5)
result = 4 * int(parameters["a"])
# Push metrics to Katib DB.
katib.report_metrics({"result": result})
katib.KatibClient(namespace="kubeflow").tune(
name="push-metrics-exp",
objective=objective,
parameters= {"a": katib.search.int(min=10, max=20)}
objective_metric_name="result",
max_trial_count=2,
metrics_collector_config={"kind": "Push"},
# When SDK is released, replace it with packages_to_install=["kubeflow-katib==0.18.0"].
# Currently, the training container should have `git` package to install this SDK.
packages_to_install=["git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1"],
)
```

0 comments on commit 99f4212

Please sign in to comment.