Add instrumenting lab (#42)

* Add initial ideas about instrumenting lab * Add introduction and overview of instrumentation lab * Add tasks for instrumenting lab * Fix linter issues * Fix one more linter issue * Change label name
puzzle · Nov 27, 2020 · 636d6e2 · 636d6e2
1 parent 99b270b
commit 636d6e2
Show file tree

Hide file tree

Showing 2 changed files with 107 additions and 0 deletions.
diff --git a/content/en/docs/05/_index.md b/content/en/docs/05/_index.md
@@ -0,0 +1,41 @@
+---
+title: "5. Instrumenting with client libraries"
+weight: 1
+sectionnumber: 1
+---
+
+While an exporter is an adapter for your service to adapt a service specific value into a metric in the Prometheus format, it is also possible to export metric data programmatically in your application code.
+
+## Client libraries
+
+The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#.
+
+Even if you don't plan to provide your own metrics those libraries already export some basic metrics based on the language. For [Java](https://github.com/prometheus/client_java#included-collectors) default metrics about memory management (Heap, garbage collection) and thread pools can be collected. Same applies for [Golang](https://prometheus.io/docs/guides/go-application/).
+
+{{% alert title="Note" color="primary" %}}
+
+Just a short mention to the Spring Framework as it is very popular in application development. The framework also supports [exporting metrics](https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector) in the Prometheus data format.
+{{% /alert %}}
+
+## Specifications and conventions
+
+There are some guidelines and best practices how to name your own metrics. Of course, the [specifications of the datamodel](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) must be followed and applying the [best practices about naming](https://prometheus.io/docs/practices/naming/) is not a bad idea. All those guidelines and best practices are now officially specified in [openmetrics.io](https://openmetrics.io).
+
+Following these principles is not (yet) a must, but it helps to understand and interpret your metrics.
+
+You can check your metrics by using the following `promtool` command: `curl -s http://localhost:8080/metrics | promtool check metrics`
+
+## Best practices
+
+Though implementing a metric is an easy task from a technical point of view, it is not so easy to define what and how to measure. If you follow your existing [log statements](https://prometheus.io/docs/practices/instrumentation/#logging) and if you define an error counter to count all [errors and exceptions](https://prometheus.io/docs/practices/instrumentation/#failures), then you already have a good base to see the internal state of your application.
+
+### The Four Golden Signals
+
+Another approach to define metrics is based on [The Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/):
+
+* Latency
+* Traffic
+* Errors
+* Saturation
+
+There are other methods like [RED](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) or [USE](http://www.brendangregg.com/usemethod.html) that go into the same direction.
diff --git a/content/en/docs/05/labs/51.md b/content/en/docs/05/labs/51.md
@@ -0,0 +1,66 @@
+---
+title: "5.1 Instrumenting"
+weight: 2
+sectionnumber: 1
+---
+
+### Task 1
+
+Study the following metrics and decide if the metric name is ok
+
+```
+http_requests{handler="/", status="200"}
+
+http_request_200_count{handler="/"}
+
+go_memstats_heap_inuse_megabytes{instance="localhost:9090",job="prometheus"}
+
+prometheus_build_info{branch="HEAD",goversion="go1.15.5",instance="localhost:9090",job="prometheus",revision="de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2",version="2.22.2"}
+
+prometheus_config_last_reload_success_timestamp{instance="localhost:9090",job="prometheus"}
+
+prometheus_tsdb_lowest_timestamp_minutes{instance="localhost:9090",job="prometheus"}
+```
+
+### Task 2
+
+What kind of risk do you have, when you see such a metric
+
+```
+http_requests_total{path="/etc/passwd", status="404"} 1
+```
+
+
+## Solutions
+
+{{% details title="Task 1" %}}
+
+* The `_total` suffix should be appended, so `http_requests_total{handler="/", status="200"}` is better.
+
+* There are two issues in `http_request_200_count{handler="/"}`: The `_count` suffix is foreseen for histograms, counters can be suffixed with `_total`. Second, status information should not be part of the metric name, a label `{status="200"}` is the better option.
+
+* The base unit is `bytes` not `megabytes`, so `go_memstats_heap_inuse_bytes` is correct.
+
+* Everything is ok with `prometheus_build_info` and it's labels. It's a good practice to export such base information with a gauge.
+
+* In `prometheus_config_last_reload_success_timestamp` the base unit is missing, correct is `prometheus_config_last_reload_success_timestamp_seconds`.
+
+* The base unit is `seconds` for timestamps, so `prometheus_tsdb_lowest_timestamp_seconds` is correct.
+
+{{% /details %}}
+
+{{% details title="Task 2" %}}
+
+No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way).
+
+From a Prometheus point of view, there is the risk of a DDOS attack: An attacker could easily make requests to paths which obviously don't exist. As every path is registered with a label, many new timeseries are created which could lead to a [cardinality explosion](https://www.robustperception.io/cardinality-is-key) and finally to out-of-memory errors.
+
+It's hard to recover from that!
+
+For this case, it's better just to count the 404 requests and to lookup the paths in the log files.
+
+```
+http_requests_total{status="404"} 15
+```
+
+{{% /details %}}