Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Commit

Permalink
Add instrumenting lab (#42)
Browse files Browse the repository at this point in the history
* Add initial ideas about instrumenting lab

* Add introduction and overview of instrumentation lab

* Add tasks for instrumenting lab

* Fix linter issues

* Fix one more linter issue

* Change label name
  • Loading branch information
rotscher authored Nov 27, 2020
1 parent 99b270b commit 636d6e2
Show file tree
Hide file tree
Showing 2 changed files with 107 additions and 0 deletions.
41 changes: 41 additions & 0 deletions content/en/docs/05/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "5. Instrumenting with client libraries"
weight: 1
sectionnumber: 1
---

While an exporter is an adapter for your service to adapt a service specific value into a metric in the Prometheus format, it is also possible to export metric data programmatically in your application code.

## Client libraries

The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#.

Even if you don't plan to provide your own metrics those libraries already export some basic metrics based on the language. For [Java](https://github.com/prometheus/client_java#included-collectors) default metrics about memory management (Heap, garbage collection) and thread pools can be collected. Same applies for [Golang](https://prometheus.io/docs/guides/go-application/).

{{% alert title="Note" color="primary" %}}

Just a short mention to the Spring Framework as it is very popular in application development. The framework also supports [exporting metrics](https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector) in the Prometheus data format.
{{% /alert %}}

## Specifications and conventions

There are some guidelines and best practices how to name your own metrics. Of course, the [specifications of the datamodel](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) must be followed and applying the [best practices about naming](https://prometheus.io/docs/practices/naming/) is not a bad idea. All those guidelines and best practices are now officially specified in [openmetrics.io](https://openmetrics.io).

Following these principles is not (yet) a must, but it helps to understand and interpret your metrics.

You can check your metrics by using the following `promtool` command: `curl -s http://localhost:8080/metrics | promtool check metrics`

## Best practices

Though implementing a metric is an easy task from a technical point of view, it is not so easy to define what and how to measure. If you follow your existing [log statements](https://prometheus.io/docs/practices/instrumentation/#logging) and if you define an error counter to count all [errors and exceptions](https://prometheus.io/docs/practices/instrumentation/#failures), then you already have a good base to see the internal state of your application.

### The Four Golden Signals

Another approach to define metrics is based on [The Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/):

* Latency
* Traffic
* Errors
* Saturation

There are other methods like [RED](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) or [USE](http://www.brendangregg.com/usemethod.html) that go into the same direction.
66 changes: 66 additions & 0 deletions content/en/docs/05/labs/51.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "5.1 Instrumenting"
weight: 2
sectionnumber: 1
---

### Task 1

Study the following metrics and decide if the metric name is ok

```
http_requests{handler="/", status="200"}
http_request_200_count{handler="/"}
go_memstats_heap_inuse_megabytes{instance="localhost:9090",job="prometheus"}
prometheus_build_info{branch="HEAD",goversion="go1.15.5",instance="localhost:9090",job="prometheus",revision="de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2",version="2.22.2"}
prometheus_config_last_reload_success_timestamp{instance="localhost:9090",job="prometheus"}
prometheus_tsdb_lowest_timestamp_minutes{instance="localhost:9090",job="prometheus"}
```

### Task 2

What kind of risk do you have, when you see such a metric

```
http_requests_total{path="/etc/passwd", status="404"} 1
```


## Solutions

{{% details title="Task 1" %}}

* The `_total` suffix should be appended, so `http_requests_total{handler="/", status="200"}` is better.

* There are two issues in `http_request_200_count{handler="/"}`: The `_count` suffix is foreseen for histograms, counters can be suffixed with `_total`. Second, status information should not be part of the metric name, a label `{status="200"}` is the better option.

* The base unit is `bytes` not `megabytes`, so `go_memstats_heap_inuse_bytes` is correct.

* Everything is ok with `prometheus_build_info` and it's labels. It's a good practice to export such base information with a gauge.

* In `prometheus_config_last_reload_success_timestamp` the base unit is missing, correct is `prometheus_config_last_reload_success_timestamp_seconds`.

* The base unit is `seconds` for timestamps, so `prometheus_tsdb_lowest_timestamp_seconds` is correct.

{{% /details %}}

{{% details title="Task 2" %}}

No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way).

From a Prometheus point of view, there is the risk of a DDOS attack: An attacker could easily make requests to paths which obviously don't exist. As every path is registered with a label, many new timeseries are created which could lead to a [cardinality explosion](https://www.robustperception.io/cardinality-is-key) and finally to out-of-memory errors.

It's hard to recover from that!

For this case, it's better just to count the 404 requests and to lookup the paths in the log files.

```
http_requests_total{status="404"} 15
```

{{% /details %}}

0 comments on commit 636d6e2

Please sign in to comment.