Skip to content

Commit

Permalink
Merge pull request #95 from DirectXMan12/docs/common-issues
Browse files Browse the repository at this point in the history
[docs] Config Walkthroughs and FAQs
  • Loading branch information
DirectXMan12 authored Aug 24, 2018
2 parents 1f6df8e + a1f4aab commit b755cf7
Show file tree
Hide file tree
Showing 4 changed files with 567 additions and 226 deletions.
97 changes: 97 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ metrics API
suitable for use with the autoscaling/v2 Horizontal Pod Autoscaler in
Kubernetes 1.6+.

Quick Links
-----------

- [Config walkthrough](docs/config-walkthrough.md) and [config reference](docs/config.md).
- [End-to-end walkthrough](docs/walkthrough.md)
- [Deployment info and files](deploy/README.md)

Configuration
-------------

Expand Down Expand Up @@ -76,3 +83,93 @@ attention to:
Operator](https://github.com/luxas/kubeadm-workshop#deploying-the-prometheus-operator-for-monitoring-services-in-the-cluster)
- [Setting up the custom metrics adapter and sample
app](https://github.com/luxas/kubeadm-workshop#deploying-a-custom-metrics-api-server-and-a-sample-app)

FAQs
----

### Why do my metrics keep jumping between a normal value and a very large number?

You're probably switching between whole numbers (e.g. `10`) and milli-quantities (e.g. `10500m`).
Worry not! This is just how Kubernetes represents fractional values. See the
[Quantity Values](/docs/walkthrough.md#quantity-values) section of the walkthrough for a bit more
information.

### Why isn't my metric showing up?

First, check your configuration. Does it select your metric? You can
find the [default configuration](/deploy/custom-metrics-config-map.yaml)
in the deploy directory, and more information about configuring the
adapter in the [docs](/docs/config.md).

Next, check if the discovery information looks right. You should see the
metrics showing up as associated with the resources you expect at
`/apis/custom.metrics.k8s.io/v1beta1/` (you can use `kubectl get --raw
/apis/custom.metrics.k8s.io/v1beta1` to check, and can pipe to `jq` to
pretty-print the results, if you have it installed). If not, make sure
your series are labeled correctly. Consumers of the custom metrics API
(especially the HPA) don't do any special logic to associate a particular
resource to a particular series, so you have to make sure that the adapter
does it instead.

For example, if you want a series `foo` to be associated with deployment
`bar` in namespace `somens`, make sure there's some label that represents
deployment name, and that the adapter is configured to use it. With the
default config, that means you'd need the query
`foo{namespace="somens",deployment="bar"}` to return some results in
Prometheus.

Next, try using the `--v=6` flag on the adapter to see the exact queries
being made by the adapter. Try url-decoding the query and pasting it into
the Prometheus web console to see if the query looks wrong.

### My query contains multiple metrics, how do I make that work?

It's actually fairly straightforward, if a bit non-obvious. Simply choose one
metric to act as the "discovery" and "naming" metric, and use that to configure
the "discovery" and "naming" parts of the configuration. Then, you can write
whichever metrics you want in the `metricsQuery`! The series query can contain
whichever metrics you want, as long as they have the right set of labels.

For example, if you have two metrics `foo_total` and `foo_count`, you might write

```yaml
rules:
- seriesQuery: 'foo_total'
resources: {overrides: {system_name: {resource: "node"}}}
name:
matches: 'foo_total'
as: 'foo'
metricsQuery: 'sum(foo_total) by (<<.GroupBy>>) / sum(foo_count) by (<<.GroupBy>>)'
```
### I get errors about SubjectAccessReviews/system:anonymous/TLS/Certificates/RequestHeader!
It's important to understand the role of TLS in the Kubernetes cluster. There's a high-level
overview here: https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md.
All of the above errors generally boil down to misconfigured certificates.
Specifically, you'll need to make sure your cluster's aggregation layer is
properly configured, with requestheader certificates set up properly.
Errors about SubjectAccessReviews failing for system:anonymous generally mean
that your cluster's given requestheader CA doesn't trust the proxy certificates
from the API server aggregator.
On the other hand, if you get an error from the aggregator about invalid certificates,
it's probably because the CA specified in the `caBundle` field of your APIService
object doesn't trust the serving certificates for the adapter.

If you're seeing SubjectAccessReviews failures for non-anonymous users, check your
RBAC rules -- you probably haven't given users permission to operate on resources in
the `custom.metrics.k8s.io` API group.

### My metrics appear and disappear

You probably have a Prometheus collection interval or computation interval
that's larger than your adapter's discovery interval. If the metrics
appear in discovery but occaisionally return not-found, those intervals
are probably larger than one of the rate windows used in one of your
queries. The adapter only considers metrics with datapoints in the window
`[now-discoveryInterval, now]` (in order to only capture metrics that are
still present), so make sure that your discovery interval is at least as
large as your collection interval.
230 changes: 230 additions & 0 deletions docs/config-walkthrough.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
Configuration Walkthroughs
==========================

*If you're looking for reference documentation on configuration, please
read the the [configuration reference](/docs/config.md)*

Per-pod HTTP Requests
---------------------

### Background

*The [full walkthrough](/docs/walkthrough.md) sets up a the background for
something like this*

Suppose we have some frontend webserver, and we're trying to write an
configuration for the Promtheus adapter so that we can autoscale it based
on the HTTP requests per second that it receives.

Before starting, we've gone and instrumented our frontend server with
a metric, `http_requests_total`. It is exposed with a single label,
`method`, breaking down the requests by HTTP verb.

We've configured our Prometheus to collect the metric, and our promethues
adds the `kubernetes_namespace` and `kubernetes_pod_name` labels,
representing namespace and pod, respectively.

If we query Prometheus, we see series that look like

```
http_requests_total{method="GET",kubernetes_namespace="production",kubernetes_pod_name="frontend-server-abcd-0123"}
```

### Configuring the adapter

The adapter considers metrics in the following ways:

1. First, It discovers the metrics available (*Discovery*)

2. Then, it figures out which Kubernetes resources each metric is
associated with (*Association*)

3. Then, it figures out how it should expose them to the custom metrics
API (*Naming*)

4. Finally, it figures out how it should query Prometheus to get the
actual numbers (*Querying*)

We need to inform the adapter how it should perform each of these steps
for our metric, `http_requests_total`, so we'll need to add a new
***rule***. Each rule in the adapter encodes these steps. Let's add a new
one to our configuration:

```yaml
rules:
- {}
```
If we want to find all `http_requests_total` series ourselves in the
Prometheus dashboard, we'd write
`http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}` to
find all find all `http_requests_total` series that were associated with
a namespace and pod.

We can add this to our rule in the `seriesQuery` field, to tell the
adapter how *discover* the right series itself:

```yaml
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
```

Next, we'll need to tell the adapter how to figure out which Kubernetes
resources are associated with the metric. We've already said that
`kubernetes_namespace` represents the namespace name, and
`kubernetes_pod_name` represents the pod name. Since these names don't
quite follow a consistent pattern, we use the `overrides` section of the
`resources` field in our rule:

```yaml
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
```

This says that each label represents its corresponding resource. Since the
resources are in the "core" kubernetes API, we don't need to specify
a group. The adapter will automatically take care of pluralization, so we
can specify either `pod` or `pods`, just the same way as in `kubectl get`.
The resources can be any resource available in your kubernetes cluster, as
long as you've got a corresponding label.

If our labels followed a consistent pattern, like `kubernetes_<resource>`,
we could specify `resources: {template: "kubernetes_<<.Resource>>"}`
instead of specifying an override for each resource. If you want to see
all resources currently available in your cluster, you can use the
`kubectl api-resources` command (but the list of available resources can
change as you add or remove CRDs or aggregated API servers). For more
information on resources, see [Kinds, Resources, and
Scopes](https://github.com/kubernetes-incubator/custom-metrics-apiserver/blob/master/docs/getting-started.md#kinds-resources-and-scopes)
in the custom-metrics-apiserver boilerplate guide.

Now, cumulative metrics (like those that end in `_total`) aren't
particularly useful for autoscaling, so we want to convert them to rate
metrics in the API. We'll call the rate version of our metric
`http_requests_per_second`. We can use the the `name` field to tell the
adapter about that:

```yaml
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
```

Here, we've said that we should take the name matching
`<something>_total`, and turning it into `<something>_per_second`.

Finally, we need to tell the adapter how to actually query Prometheus to
get some numbers. Since we want a rate, we might write:
`sum(rate(http_requests_total{kubernetes_namespace="production",kubernetes_pod_name=~"frontend-server-abcd-0123|fronted-server-abcd-4567"}) by (kubernetes_pod_name)`,
which would get us the total requests per second for each pod, summed across verbs.

We can write something similar in the adapter, using the `metricsQuery`
field:

```yaml
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
```

The adapter will automatically fill in the right series name, label
matchers, and group-by clause, depending on what we put into the API.
Since we're only working with a single metric anyway, we could replace
`<<.Series>>` with `http_requests_total`.

Now, if we run an instance of the Prometheus adapter with this
configuration, we should see discovery information at
`$KUBERNETES/apis/custom.metrics.k8s.io/v1beta1/` of

```json
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/http_requests_total",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": ["get"]
},
{
"name": "namespaces/http_requests_total",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": ["get"]
}
]
}
```

Notice that we get an entry for both "pods" and "namespaces" -- the
adapter exposes the metric on each resource that we've associated the
metric with (and all namespaced resources must be associated with
a namespace), and will fill in the `<<.GroupBy>>` section with the
appropriate label depending on which we ask for.

We can now connect to
`$KUBERNETES/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second`,
and we should see

```json
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second",
},
"items": [
{
"describedObject": {
"kind": "Pod",
"name": "frontend-server-abcd-0123",
"apiVersion": "/__internal",
},
"metricName": "http_requests_per_second",
"timestamp": "2018-08-07T17:45:22Z",
"value": "16m"
},
{
"describedObject": {
"kind": "Pod",
"name": "frontend-server-abcd-4567",
"apiVersion": "/__internal",
},
"metricName": "http_requests_per_second",
"timestamp": "2018-08-07T17:45:22Z",
"value": "22m"
}
]
}
```

This says that our server pods are receiving 16 and 22 milli-requests per
second (depending on the pod), which is 0.016 and 0.022 requests per
second, written out as a decimal. That's about what we'd expect with
little-to-no traffic except for the Prometheus scrape.

If we added some traffic to our pods, we might see `1` or `20` instead of
`16m`, which would be `1` or `20` requests per second. We might also see
`20500m`, which would mean 20500 milli-requests per second, or 20.5
requests per second in decimal form.
11 changes: 9 additions & 2 deletions docs/config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
Metrics Discovery and Presentation Configuration
================================================

*If you want a full walkthrough of configuring the adapter for a sample
metric, please read the [configuration
walkthrough](/docs/config-walkthrough.md)*

The adapter determines which metrics to expose, and how to expose them,
through a set of "discovery" rules. Each rule is executed independently
(so make sure that your rules are mutually exclusive), and specifies each
Expand Down Expand Up @@ -123,6 +127,9 @@ resource:
These two can be combined, so you can specify both a template and some
individual overrides.

The resources mentioned can be any resource available in your kubernetes
cluster, as long as you've got a corresponding label.

Naming
------

Expand Down Expand Up @@ -150,7 +157,7 @@ For example:
# e.g. http_requests_total becomes http_requests_per_second
name:
matches: "^(.*)_total$"
as: "<<1}_per_second"
as: "${1}_per_second"
```

Querying
Expand Down Expand Up @@ -181,7 +188,7 @@ Kubernetes resources. Then, if someone requested the metric
`pods/http_request_per_second` for the pods `pod1` and `pod2` in the
`somens` namespace, we'd have:

- `Series: "http_requests_total"
- `Series: "http_requests_total"`
- `LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"`
- `GroupBy`: `pod`

Expand Down
Loading

0 comments on commit b755cf7

Please sign in to comment.