This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add initial ideas about instrumenting lab * Add introduction and overview of instrumentation lab * Add tasks for instrumenting lab * Fix linter issues * Fix one more linter issue * Change label name
- Loading branch information
Showing
2 changed files
with
107 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: "5. Instrumenting with client libraries" | ||
weight: 1 | ||
sectionnumber: 1 | ||
--- | ||
|
||
While an exporter is an adapter for your service to adapt a service specific value into a metric in the Prometheus format, it is also possible to export metric data programmatically in your application code. | ||
|
||
## Client libraries | ||
|
||
The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#. | ||
|
||
Even if you don't plan to provide your own metrics those libraries already export some basic metrics based on the language. For [Java](https://github.com/prometheus/client_java#included-collectors) default metrics about memory management (Heap, garbage collection) and thread pools can be collected. Same applies for [Golang](https://prometheus.io/docs/guides/go-application/). | ||
|
||
{{% alert title="Note" color="primary" %}} | ||
|
||
Just a short mention to the Spring Framework as it is very popular in application development. The framework also supports [exporting metrics](https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector) in the Prometheus data format. | ||
{{% /alert %}} | ||
|
||
## Specifications and conventions | ||
|
||
There are some guidelines and best practices how to name your own metrics. Of course, the [specifications of the datamodel](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) must be followed and applying the [best practices about naming](https://prometheus.io/docs/practices/naming/) is not a bad idea. All those guidelines and best practices are now officially specified in [openmetrics.io](https://openmetrics.io). | ||
|
||
Following these principles is not (yet) a must, but it helps to understand and interpret your metrics. | ||
|
||
You can check your metrics by using the following `promtool` command: `curl -s http://localhost:8080/metrics | promtool check metrics` | ||
|
||
## Best practices | ||
|
||
Though implementing a metric is an easy task from a technical point of view, it is not so easy to define what and how to measure. If you follow your existing [log statements](https://prometheus.io/docs/practices/instrumentation/#logging) and if you define an error counter to count all [errors and exceptions](https://prometheus.io/docs/practices/instrumentation/#failures), then you already have a good base to see the internal state of your application. | ||
|
||
### The Four Golden Signals | ||
|
||
Another approach to define metrics is based on [The Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/): | ||
|
||
* Latency | ||
* Traffic | ||
* Errors | ||
* Saturation | ||
|
||
There are other methods like [RED](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) or [USE](http://www.brendangregg.com/usemethod.html) that go into the same direction. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
title: "5.1 Instrumenting" | ||
weight: 2 | ||
sectionnumber: 1 | ||
--- | ||
|
||
### Task 1 | ||
|
||
Study the following metrics and decide if the metric name is ok | ||
|
||
``` | ||
http_requests{handler="/", status="200"} | ||
http_request_200_count{handler="/"} | ||
go_memstats_heap_inuse_megabytes{instance="localhost:9090",job="prometheus"} | ||
prometheus_build_info{branch="HEAD",goversion="go1.15.5",instance="localhost:9090",job="prometheus",revision="de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2",version="2.22.2"} | ||
prometheus_config_last_reload_success_timestamp{instance="localhost:9090",job="prometheus"} | ||
prometheus_tsdb_lowest_timestamp_minutes{instance="localhost:9090",job="prometheus"} | ||
``` | ||
|
||
### Task 2 | ||
|
||
What kind of risk do you have, when you see such a metric | ||
|
||
``` | ||
http_requests_total{path="/etc/passwd", status="404"} 1 | ||
``` | ||
|
||
|
||
## Solutions | ||
|
||
{{% details title="Task 1" %}} | ||
|
||
* The `_total` suffix should be appended, so `http_requests_total{handler="/", status="200"}` is better. | ||
|
||
* There are two issues in `http_request_200_count{handler="/"}`: The `_count` suffix is foreseen for histograms, counters can be suffixed with `_total`. Second, status information should not be part of the metric name, a label `{status="200"}` is the better option. | ||
|
||
* The base unit is `bytes` not `megabytes`, so `go_memstats_heap_inuse_bytes` is correct. | ||
|
||
* Everything is ok with `prometheus_build_info` and it's labels. It's a good practice to export such base information with a gauge. | ||
|
||
* In `prometheus_config_last_reload_success_timestamp` the base unit is missing, correct is `prometheus_config_last_reload_success_timestamp_seconds`. | ||
|
||
* The base unit is `seconds` for timestamps, so `prometheus_tsdb_lowest_timestamp_seconds` is correct. | ||
|
||
{{% /details %}} | ||
|
||
{{% details title="Task 2" %}} | ||
|
||
No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way). | ||
|
||
From a Prometheus point of view, there is the risk of a DDOS attack: An attacker could easily make requests to paths which obviously don't exist. As every path is registered with a label, many new timeseries are created which could lead to a [cardinality explosion](https://www.robustperception.io/cardinality-is-key) and finally to out-of-memory errors. | ||
|
||
It's hard to recover from that! | ||
|
||
For this case, it's better just to count the 404 requests and to lookup the paths in the log files. | ||
|
||
``` | ||
http_requests_total{status="404"} 15 | ||
``` | ||
|
||
{{% /details %}} |