Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Add instrumenting lab #42

Merged
merged 6 commits into from
Nov 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions content/en/docs/05/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "5. Instrumenting with client libraries"
weight: 1
sectionnumber: 1
---

While an exporter is an adapter for your service to adapt a service specific value into a metric in the Prometheus format, it is also possible to export metric data programmatically in your application code.

## Client libraries

The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#.
The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for major languages like Java, Golang, Python, PHP and even .net/C#.


Even if you don't plan to provide your own metrics those libraries already export some basic metrics based on the language. For [Java](https://github.com/prometheus/client_java#included-collectors) default metrics about memory management (Heap, garbage collection) and thread pools can be collected. Same applies for [Golang](https://prometheus.io/docs/guides/go-application/).

{{% alert title="Note" color="primary" %}}

Just a short mention to the Spring Framework as it is very popular in application development. The framework also supports [exporting metrics](https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector) in the Prometheus data format.
{{% /alert %}}

## Specifications and conventions

There are some guidelines and best practices how to name your own metrics. Of course, the [specifications of the datamodel](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) must be followed and applying the [best practices about naming](https://prometheus.io/docs/practices/naming/) is not a bad idea. All those guidelines and best practices are now officially specified in [openmetrics.io](https://openmetrics.io).

Following these principles is not (yet) a must, but it helps to understand and interpret your metrics.

You can check your metrics by using the following `promtool` command: `curl -s http://localhost:8080/metrics | promtool check metrics`

## Best practices

Though implementing a metric is an easy task from a technical point of view, it is not so easy to define what and how to measure. If you follow your existing [log statements](https://prometheus.io/docs/practices/instrumentation/#logging) and if you define an error counter to count all [errors and exceptions](https://prometheus.io/docs/practices/instrumentation/#failures), then you already have a good base to see the internal state of your application.

### The Four Golden Signals

Another approach to define metrics is based on [The Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/):

* Latency
* Traffic
* Errors
* Saturation

There are other methods like [RED](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) or [USE](http://www.brendangregg.com/usemethod.html) that go into the same direction.
66 changes: 66 additions & 0 deletions content/en/docs/05/labs/51.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "5.1 Instrumenting"
weight: 2
sectionnumber: 1
---

### Task 1

Study the following metrics and decide if the metric name is ok

```
http_requests{handler="/", status="200"}

http_request_200_count{handler="/"}

go_memstats_heap_inuse_megabytes{instance="localhost:9090",job="prometheus"}

prometheus_build_info{branch="HEAD",goversion="go1.15.5",instance="localhost:9090",job="prometheus",revision="de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2",version="2.22.2"}

prometheus_config_last_reload_success_timestamp{instance="localhost:9090",job="prometheus"}

prometheus_tsdb_lowest_timestamp_minutes{instance="localhost:9090",job="prometheus"}
```

### Task 2

What kind of risk do you have, when you see such a metric

```
http_requests_total{path="/etc/passwd", status="404"} 1
```


## Solutions

{{% details title="Task 1" %}}

* The `_total` suffix should be appended, so `http_requests_total{handler="/", status="200"}` is better.

* There are two issues in `http_request_200_count{handler="/"}`: The `_count` suffix is foreseen for histograms, counters can be suffixed with `_total`. Second, status information should not be part of the metric name, a label `{status="200"}` is the better option.

* The base unit is `bytes` not `megabytes`, so `go_memstats_heap_inuse_bytes` is correct.

* Everything is ok with `prometheus_build_info` and it's labels. It's a good practice to export such base information with a gauge.

* In `prometheus_config_last_reload_success_timestamp` the base unit is missing, correct is `prometheus_config_last_reload_success_timestamp_seconds`.

* The base unit is `seconds` for timestamps, so `prometheus_tsdb_lowest_timestamp_seconds` is correct.

{{% /details %}}

{{% details title="Task 2" %}}

No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way).
No, it's not the possible security vulnerability (which seems to be handled by the way appropriate in this case).


From a Prometheus point of view, there is the risk of a DDOS attack: An attacker could easily make requests to paths which obviously don't exist. As every path is registered with a label, many new timeseries are created which could lead to a [cardinality explosion](https://www.robustperception.io/cardinality-is-key) and finally to out-of-memory errors.

It's hard to recover from that!

For this case, it's better just to count the 404 requests and to lookup the paths in the log files.

```
http_requests_total{status="404"} 15
```

{{% /details %}}