This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Add instrumenting lab #42
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
5657fe0
Add initial ideas about instrumenting lab
rotscher f6096e9
Add introduction and overview of instrumentation lab
rotscher aff7d8f
Add tasks for instrumenting lab
rotscher 293bc14
Fix linter issues
rotscher 9ab4228
Fix one more linter issue
rotscher 5872005
Change label name
rotscher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: "5. Instrumenting with client libraries" | ||
weight: 1 | ||
sectionnumber: 1 | ||
--- | ||
|
||
While an exporter is an adapter for your service to adapt a service specific value into a metric in the Prometheus format, it is also possible to export metric data programmatically in your application code. | ||
|
||
## Client libraries | ||
|
||
The Prometheus project provides [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/) which are either official or maintained by third-parties. There are libraries for the major languages like Java, Golang, Python, PHP and even .net/C#. | ||
|
||
Even if you don't plan to provide your own metrics those libraries already export some basic metrics based on the language. For [Java](https://github.com/prometheus/client_java#included-collectors) default metrics about memory management (Heap, garbage collection) and thread pools can be collected. Same applies for [Golang](https://prometheus.io/docs/guides/go-application/). | ||
|
||
{{% alert title="Note" color="primary" %}} | ||
|
||
Just a short mention to the Spring Framework as it is very popular in application development. The framework also supports [exporting metrics](https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector) in the Prometheus data format. | ||
{{% /alert %}} | ||
|
||
## Specifications and conventions | ||
|
||
There are some guidelines and best practices how to name your own metrics. Of course, the [specifications of the datamodel](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) must be followed and applying the [best practices about naming](https://prometheus.io/docs/practices/naming/) is not a bad idea. All those guidelines and best practices are now officially specified in [openmetrics.io](https://openmetrics.io). | ||
|
||
Following these principles is not (yet) a must, but it helps to understand and interpret your metrics. | ||
|
||
You can check your metrics by using the following `promtool` command: `curl -s http://localhost:8080/metrics | promtool check metrics` | ||
|
||
## Best practices | ||
|
||
Though implementing a metric is an easy task from a technical point of view, it is not so easy to define what and how to measure. If you follow your existing [log statements](https://prometheus.io/docs/practices/instrumentation/#logging) and if you define an error counter to count all [errors and exceptions](https://prometheus.io/docs/practices/instrumentation/#failures), then you already have a good base to see the internal state of your application. | ||
|
||
### The Four Golden Signals | ||
|
||
Another approach to define metrics is based on [The Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/): | ||
|
||
* Latency | ||
* Traffic | ||
* Errors | ||
* Saturation | ||
|
||
There are other methods like [RED](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) or [USE](http://www.brendangregg.com/usemethod.html) that go into the same direction. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,66 @@ | ||||||
--- | ||||||
title: "5.1 Instrumenting" | ||||||
weight: 2 | ||||||
sectionnumber: 1 | ||||||
--- | ||||||
|
||||||
### Task 1 | ||||||
|
||||||
Study the following metrics and decide if the metric name is ok | ||||||
|
||||||
``` | ||||||
http_requests{handler="/", status="200"} | ||||||
|
||||||
http_request_200_count{handler="/"} | ||||||
|
||||||
go_memstats_heap_inuse_megabytes{instance="localhost:9090",job="prometheus"} | ||||||
|
||||||
prometheus_build_info{branch="HEAD",goversion="go1.15.5",instance="localhost:9090",job="prometheus",revision="de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2",version="2.22.2"} | ||||||
|
||||||
prometheus_config_last_reload_success_timestamp{instance="localhost:9090",job="prometheus"} | ||||||
|
||||||
prometheus_tsdb_lowest_timestamp_minutes{instance="localhost:9090",job="prometheus"} | ||||||
``` | ||||||
|
||||||
### Task 2 | ||||||
|
||||||
What kind of risk do you have, when you see such a metric | ||||||
|
||||||
``` | ||||||
http_requests_total{path="/etc/passwd", status="404"} 1 | ||||||
``` | ||||||
|
||||||
|
||||||
## Solutions | ||||||
|
||||||
{{% details title="Task 1" %}} | ||||||
|
||||||
* The `_total` suffix should be appended, so `http_requests_total{handler="/", status="200"}` is better. | ||||||
|
||||||
* There are two issues in `http_request_200_count{handler="/"}`: The `_count` suffix is foreseen for histograms, counters can be suffixed with `_total`. Second, status information should not be part of the metric name, a label `{status="200"}` is the better option. | ||||||
|
||||||
* The base unit is `bytes` not `megabytes`, so `go_memstats_heap_inuse_bytes` is correct. | ||||||
|
||||||
* Everything is ok with `prometheus_build_info` and it's labels. It's a good practice to export such base information with a gauge. | ||||||
|
||||||
* In `prometheus_config_last_reload_success_timestamp` the base unit is missing, correct is `prometheus_config_last_reload_success_timestamp_seconds`. | ||||||
|
||||||
* The base unit is `seconds` for timestamps, so `prometheus_tsdb_lowest_timestamp_seconds` is correct. | ||||||
|
||||||
{{% /details %}} | ||||||
|
||||||
{{% details title="Task 2" %}} | ||||||
|
||||||
No, it's not the possible security vulnerability (which seems to be handled appropriate in this case, by the way). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
From a Prometheus point of view, there is the risk of a DDOS attack: An attacker could easily make requests to paths which obviously don't exist. As every path is registered with a label, many new timeseries are created which could lead to a [cardinality explosion](https://www.robustperception.io/cardinality-is-key) and finally to out-of-memory errors. | ||||||
|
||||||
It's hard to recover from that! | ||||||
|
||||||
For this case, it's better just to count the 404 requests and to lookup the paths in the log files. | ||||||
|
||||||
``` | ||||||
http_requests_total{status="404"} 15 | ||||||
``` | ||||||
|
||||||
{{% /details %}} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.