Merge pull request #95 from DirectXMan12/docs/common-issues

[docs] Config Walkthroughs and FAQs
kubernetes-sigs · Aug 24, 2018 · b755cf7 · b755cf7
2 parents 1f6df8e + a1f4aab
commit b755cf7
Show file tree

Hide file tree

Showing 4 changed files with 567 additions and 226 deletions.
diff --git a/README.md b/README.md
@@ -9,6 +9,13 @@ metrics API
 suitable for use with the autoscaling/v2 Horizontal Pod Autoscaler in
 Kubernetes 1.6+.
 
+Quick Links
+-----------
+
+- [Config walkthrough](docs/config-walkthrough.md) and [config reference](docs/config.md).
+- [End-to-end walkthrough](docs/walkthrough.md)
+- [Deployment info and files](deploy/README.md)
+
 Configuration
 -------------
 
@@ -76,3 +83,93 @@ attention to:
   Operator](https://github.com/luxas/kubeadm-workshop#deploying-the-prometheus-operator-for-monitoring-services-in-the-cluster)
 - [Setting up the custom metrics adapter and sample
   app](https://github.com/luxas/kubeadm-workshop#deploying-a-custom-metrics-api-server-and-a-sample-app)
+
+FAQs
+----
+
+### Why do my metrics keep jumping between a normal value and a very large number?
+
+You're probably switching between whole numbers (e.g. `10`) and milli-quantities (e.g. `10500m`).
+Worry not!  This is just how Kubernetes represents fractional values.  See the
+[Quantity Values](/docs/walkthrough.md#quantity-values) section of the walkthrough for a bit more
+information.
+
+### Why isn't my metric showing up?
+
+First, check your configuration.  Does it select your metric?  You can
+find the [default configuration](/deploy/custom-metrics-config-map.yaml)
+in the deploy directory, and more information about configuring the
+adapter in the [docs](/docs/config.md).
+
+Next, check if the discovery information looks right.  You should see the
+metrics showing up as associated with the resources you expect at
+`/apis/custom.metrics.k8s.io/v1beta1/` (you can use `kubectl get --raw
+/apis/custom.metrics.k8s.io/v1beta1` to check, and can pipe to `jq` to
+pretty-print the results, if you have it installed). If not, make sure
+your series are labeled correctly.  Consumers of the custom metrics API
+(especially the HPA) don't do any special logic to associate a particular
+resource to a particular series, so you have to make sure that the adapter
+does it instead.
+
+For example, if you want a series `foo` to be associated with deployment
+`bar` in namespace `somens`, make sure there's some label that represents
+deployment name, and that the adapter is configured to use it.  With the
+default config, that means you'd need the query
+`foo{namespace="somens",deployment="bar"}` to return some results in
+Prometheus.
+
+Next, try using the `--v=6` flag on the adapter to see the exact queries
+being made by the adapter.  Try url-decoding the query and pasting it into
+the Prometheus web console to see if the query looks wrong.
+
+### My query contains multiple metrics, how do I make that work?
+
+It's actually fairly straightforward, if a bit non-obvious.  Simply choose one
+metric to act as the "discovery" and "naming" metric, and use that to configure
+the "discovery" and "naming" parts of the configuration.  Then, you can write
+whichever metrics you want in the `metricsQuery`!  The series query can contain
+whichever metrics you want, as long as they have the right set of labels.
+
+For example, if you have two metrics `foo_total` and `foo_count`, you might write
+
+```yaml
+rules:
+- seriesQuery: 'foo_total'
+  resources: {overrides: {system_name: {resource: "node"}}}
+  name:
+    matches: 'foo_total'
+    as: 'foo'
+  metricsQuery: 'sum(foo_total) by (<<.GroupBy>>) / sum(foo_count) by (<<.GroupBy>>)'
+```
+
+### I get errors about SubjectAccessReviews/system:anonymous/TLS/Certificates/RequestHeader!
+
+It's important to understand the role of TLS in the Kubernetes cluster.  There's a high-level
+overview here: https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md.
+
+All of the above errors generally boil down to misconfigured certificates.
+Specifically, you'll need to make sure your cluster's aggregation layer is
+properly configured, with requestheader certificates set up properly.
+
+Errors about SubjectAccessReviews failing for system:anonymous generally mean
+that your cluster's given requestheader CA doesn't trust the proxy certificates
+from the API server aggregator.
+
+On the other hand, if you get an error from the aggregator about invalid certificates,
+it's probably because the CA specified in the `caBundle` field of your APIService
+object doesn't trust the serving certificates for the adapter.
+
+If you're seeing SubjectAccessReviews failures for non-anonymous users, check your
+RBAC rules -- you probably haven't given users permission to operate on resources in
+the `custom.metrics.k8s.io` API group.
+
+### My metrics appear and disappear
+
+You probably have a Prometheus collection interval or computation interval
+that's larger than your adapter's discovery interval.  If the metrics
+appear in discovery but occaisionally return not-found, those intervals
+are probably larger than one of the rate windows used in one of your
+queries.  The adapter only considers metrics with datapoints in the window
+`[now-discoveryInterval, now]` (in order to only capture metrics that are
+still present), so make sure that your discovery interval is at least as
+large as your collection interval.
diff --git a/docs/config-walkthrough.md b/docs/config-walkthrough.md
@@ -0,0 +1,230 @@
+Configuration Walkthroughs
+==========================
+
+*If you're looking for reference documentation on configuration, please
+read the the [configuration reference](/docs/config.md)*
+
+Per-pod HTTP Requests
+---------------------
+
+### Background
+
+*The [full walkthrough](/docs/walkthrough.md) sets up a the background for
+something like this*
+
+Suppose we have some frontend webserver, and we're trying to write an
+configuration for the Promtheus adapter so that we can autoscale it based
+on the HTTP requests per second that it receives.
+
+Before starting, we've gone and instrumented our frontend server with
+a metric, `http_requests_total`.  It is exposed with a single label,
+`method`, breaking down the requests by HTTP verb.
+
+We've configured our Prometheus to collect the metric, and our promethues
+adds the `kubernetes_namespace` and `kubernetes_pod_name` labels,
+representing namespace and pod, respectively.
+
+If we query Prometheus, we see series that look like
+
+```
+http_requests_total{method="GET",kubernetes_namespace="production",kubernetes_pod_name="frontend-server-abcd-0123"}
+```
+
+### Configuring the adapter
+
+The adapter considers metrics in the following ways:
+
+1. First, It discovers the metrics available (*Discovery*)
+
+2. Then, it figures out which Kubernetes resources each metric is
+   associated with (*Association*)
+
+3. Then, it figures out how it should expose them to the custom metrics
+   API (*Naming*)
+
+4. Finally, it figures out how it should query Prometheus to get the
+   actual numbers (*Querying*)
+
+We need to inform the adapter how it should perform each of these steps
+for our metric, `http_requests_total`, so we'll need to add a new
+***rule***. Each rule in the adapter encodes these steps.  Let's add a new
+one to our configuration:
+
+```yaml
+rules:
+- {}
+```
+
+If we want to find all `http_requests_total` series ourselves in the
+Prometheus dashboard, we'd write
+`http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}` to
+find all find all `http_requests_total` series that were associated with
+a namespace and pod.
+
+We can add this to our rule in the `seriesQuery` field, to tell the
+adapter how *discover* the right series itself:
+
+```yaml
+rules:
+- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
+```
+
+Next, we'll need to tell the adapter how to figure out which Kubernetes
+resources are associated with the metric.  We've already said that
+`kubernetes_namespace` represents the namespace name, and
+`kubernetes_pod_name` represents the pod name.  Since these names don't
+quite follow a consistent pattern, we use the `overrides` section of the
+`resources` field in our rule:
+
+```yaml
+rules:
+- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
+  resources:
+    overrides:
+      kubernetes_namespace: {resource: "namespace"}
+      kubernetes_pod_name: {resource: "pod"}
+```
+
+This says that each label represents its corresponding resource. Since the
+resources are in the "core" kubernetes API, we don't need to specify
+a group.  The adapter will automatically take care of pluralization, so we
+can specify either `pod` or `pods`, just the same way as in `kubectl get`.
+The resources can be any resource available in your kubernetes cluster, as
+long as you've got a corresponding label.
+
+If our labels followed a consistent pattern, like `kubernetes_<resource>`,
+we could specify `resources: {template: "kubernetes_<<.Resource>>"}`
+instead of specifying an override for each resource.  If you want to see
+all resources currently available in your cluster, you can use the
+`kubectl api-resources` command (but the list of available resources can
+change as you add or remove CRDs or aggregated API servers).  For more
+information on resources, see [Kinds, Resources, and
+Scopes](https://github.com/kubernetes-incubator/custom-metrics-apiserver/blob/master/docs/getting-started.md#kinds-resources-and-scopes)
+in the custom-metrics-apiserver boilerplate guide.
+
+Now, cumulative metrics (like those that end in `_total`) aren't
+particularly useful for autoscaling, so we want to convert them to rate
+metrics in the API.  We'll call the rate version of our metric
+`http_requests_per_second`.  We can use the the `name` field to tell the
+adapter about that:
+
+```yaml
+rules:
+- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
+  resources:
+    overrides:
+      kubernetes_namespace: {resource: "namespace"}
+      kubernetes_pod_name: {resource: "pod"}
+  name:
+    matches: "^(.*)_total"
+    as: "${1}_per_second"
+```
+
+Here, we've said that we should take the name matching
+`<something>_total`, and turning it into `<something>_per_second`.
+
+Finally, we need to tell the adapter how to actually query Prometheus to
+get some numbers.  Since we want a rate, we might write:
+`sum(rate(http_requests_total{kubernetes_namespace="production",kubernetes_pod_name=~"frontend-server-abcd-0123|fronted-server-abcd-4567"}) by (kubernetes_pod_name)`,
+which would get us the total requests per second for each pod, summed across verbs.
+
+We can write something similar in the adapter, using the `metricsQuery`
+field:
+
+```yaml
+rules:
+- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
+  resources:
+    overrides:
+      kubernetes_namespace: {resource: "namespace"}
+      kubernetes_pod_name: {resource: "pod"}
+  name:
+    matches: "^(.*)_total"
+    as: "${1}_per_second"
+  metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
+```
+
+The adapter will automatically fill in the right series name, label
+matchers, and group-by clause, depending on what we put into the API.
+Since we're only working with a single metric anyway, we could replace
+`<<.Series>>` with `http_requests_total`.
+
+Now, if we run an instance of the Prometheus adapter with this
+configuration, we should see discovery information at
+`$KUBERNETES/apis/custom.metrics.k8s.io/v1beta1/` of
+
+```json
+{
+  "kind": "APIResourceList",
+  "apiVersion": "v1",
+  "groupVersion": "custom.metrics.k8s.io/v1beta1",
+  "resources": [
+    {
+      "name": "pods/http_requests_total",
+      "singularName": "",
+      "namespaced": true,
+      "kind": "MetricValueList",
+      "verbs": ["get"]
+    },
+    {
+      "name": "namespaces/http_requests_total",
+      "singularName": "",
+      "namespaced": false,
+      "kind": "MetricValueList",
+      "verbs": ["get"]
+    }
+  ]
+}
+```
+
+Notice that we get an entry for both "pods" and "namespaces" -- the
+adapter exposes the metric on each resource that we've associated the
+metric with (and all namespaced resources must be associated with
+a namespace), and will fill in the `<<.GroupBy>>` section with the
+appropriate label depending on which we ask for.
+
+We can now connect to
+`$KUBERNETES/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second`,
+and we should see
+
+```json
+{
+  "kind": "MetricValueList",
+  "apiVersion": "custom.metrics.k8s.io/v1beta1",
+  "metadata": {
+    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second",
+  },
+  "items": [
+    {
+      "describedObject": {
+        "kind": "Pod",
+        "name": "frontend-server-abcd-0123",
+        "apiVersion": "/__internal",
+      },
+      "metricName": "http_requests_per_second",
+      "timestamp": "2018-08-07T17:45:22Z",
+      "value": "16m"
+    },
+    {
+      "describedObject": {
+        "kind": "Pod",
+        "name": "frontend-server-abcd-4567",
+        "apiVersion": "/__internal",
+      },
+      "metricName": "http_requests_per_second",
+      "timestamp": "2018-08-07T17:45:22Z",
+      "value": "22m"
+    }
+  ]
+}
+```
+
+This says that our server pods are receiving 16 and 22 milli-requests per
+second (depending on the pod), which is 0.016 and 0.022 requests per
+second, written out as a decimal.  That's about what we'd expect with
+little-to-no traffic except for the Prometheus scrape.
+
+If we added some traffic to our pods, we might see `1` or `20` instead of
+`16m`, which would be `1` or `20` requests per second.  We might also see
+`20500m`, which would mean 20500 milli-requests per second, or 20.5
+requests per second in decimal form.
diff --git a/docs/config.md b/docs/config.md
@@ -1,6 +1,10 @@
 Metrics Discovery and Presentation Configuration
 ================================================
 
+*If you want a full walkthrough of configuring the adapter for a sample
+metric, please read the [configuration
+walkthrough](/docs/config-walkthrough.md)*
+
 The adapter determines which metrics to expose, and how to expose them,
 through a set of "discovery" rules.  Each rule is executed independently
 (so make sure that your rules are mutually exclusive), and specifies each
@@ -123,6 +127,9 @@ resource:
 These two can be combined, so you can specify both a template and some
 individual overrides.
 
+The resources mentioned can be any resource available in your kubernetes
+cluster, as long as you've got a corresponding label.
+
 Naming
 ------
 
@@ -150,7 +157,7 @@ For example:
 # e.g. http_requests_total becomes http_requests_per_second
 name:
   matches: "^(.*)_total$"
-  as: "<<1}_per_second"
+  as: "${1}_per_second"
 ```
 
 Querying
@@ -181,7 +188,7 @@ Kubernetes resources.  Then, if someone requested the metric
 `pods/http_request_per_second` for the pods `pod1` and `pod2` in the
 `somens` namespace, we'd have:
 
-- `Series: "http_requests_total"
+- `Series: "http_requests_total"`
 - `LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"`
 - `GroupBy`: `pod`