Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kafka Prometheus metrics export #128

Merged
merged 14 commits into from
Feb 3, 2018
Merged

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Jan 19, 2018

... only to brokers, and using kubectl patch to keep this optional. The JMX containers consume significant memory (100-200 MB per pod) so you'll only want them if you have Prometheus. For other uses of JMX, like the interactive use cases that the spec seems designed for, see #96.

This PR replaces #49 (which is still a valid addon for tags <3.1) and #93. Zookeeper metrics is excluded in this PR though, because I hope #125 includes what we need from there.

Both memory limits and liveness probe is experimental right now. We're running them in QA now to see how low we can go. See #49 (comment) for history.

We'll keep an eye on the jmx_scrape_duration_seconds metric now, and OOMKilled events. Once again see #49 for history.

@yacut Do you have better config still, than the one I copied to this PR?

@solsson solsson mentioned this pull request Jan 19, 2018
and rely on metric staleness alerts instead for exporter liveness.

This reverts commit 74a5177.
@solsson solsson added this to the v3.1 milestone Jan 19, 2018
@solsson
Copy link
Contributor Author

solsson commented Jan 19, 2018

Gotcha: kubectl patch can't be used to remove for example a liveness probe (0d78e08) . I had to replace using the manifest from ./kafka, then re-apply the patch. That'll be more risky if we enable RollingUpdate for the statefulset.

and we might not need liveness if we have alerts for stale metrics.

This reverts commit f1e6e96.
may be impacting the producer clients, losing messages or causing back-pressure in the application.
This is most often a “site down” type of problem and will need to be addressed immediately.”

Excerpt from: Neha Narkhede, Gwen Shapira, and Todd Palino. ”Kafka: The Definitive Guide”.

We now export kafka_controller_kafkacontroller_value{name="OfflinePartitionsCount",} and friends.
See #140 for why.
@solsson
Copy link
Contributor Author

solsson commented Feb 3, 2018

Time to merge. Feels quite robust now; you can't really live without it in cases like #116.

Noteworthy:

  • JMX config can be changed without broker restart. Simply re-apply the configmap. To verify that kafka picked up your latest version: k-kafka exec kafka-0 -c metrics -- cat /etc/jmx-kafka/jmx-kafka-prometheus.yml
  • The subset of metrics in this PR comes from Addon: expose /metrics endpoints for Prometheus #49 and we'll probably tweak that over time as use cases pop up in this repo.
    • The performance of jmx_exporter has received significant attention upstream, so there's probably no need to be this frugal anymor.
    • Still it's useful for Prometheus usability & performance to reduce the number of (uninteresting) time series.
  • We downcase time series names because it seems to be a Prometheus convention. Labels from bean types are upper case though as Kafka metrics are documented that way.
  • To add new metrics it helps to temporarily export all, then for example kubectl run --rm -ti --image solsson/curl temp-curl -- curl http://kafka-0.broker.kafka:5556/metrics.

@solsson solsson merged commit 5a2b8c7 into master Feb 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant