Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposes JMX for brokers, and exemplify key cluster-level metric #93

Closed
wants to merge 5 commits into from

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Nov 9, 2017

Already included in #49, but I would like to keep metrics opt-in while that PR adds a quite heavy container to the pod.

The exposed port can be utilized by kafka-manager (#83) - just tick the JMX box when adding a cluster - to see bytes in/out rates.

Already included in #49, but here we don't add any export container to the pod.

Can be utilized by kafka-manager (#83) - just tick the JMX box when adding a cluster -
to see bytes in/out rates.
@solsson
Copy link
Contributor Author

solsson commented Nov 9, 2017

Given the countless options on how to consume Kafka metrics, I'd like to avoid making a specific implementation like #49 "core" by adding it to the kafka and zookeeper manifests. Instead I'd like this repo to encourage experimentation with different methods. Also, since v3.0.0 there's an ongoing transition from the old addons concept to a feature folder. The addition of more containers to core pods is something I haven't found how to separate in opt-in manifest files.

We do have to make the JMX_PORT env var default, but it's rather standard for Kafka.

The brokers-prometheus deployment in has IMO these advantages:

  • Easier to experiment with memory and cpu limits, becase pod stats are easily avalable.
  • We don't run cluster-level metrics against an unready broker pod.
  • Broker level metrics can have whitelist optimized for actual broker-level metrics.

Scrape times on minikube for this single metric is 5-15 for me. Not very good.

@solsson
Copy link
Contributor Author

solsson commented Nov 9, 2017

Feature that worked in #49 too, but was less of an advantage because it was the same configmap as kafka, you can simply apply 10-metrics-config.yml and the exporter will show: jmx_config_reload_success_total 1.0.

... though I've seen PartitionCount toggle between including
the partitions in __consumer_offsets and not doing so.
@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

This PR is a poor replacement for #49. If I kill one broker (after editing the init script so it won't go up again), my /metrics with the two test clients running alternate between:

kafka_server_ReplicaManager_Value{name="PartitionCount",} 51.0
kafka_server_ReplicaManager_Value{name="UnderReplicatedPartitions",} 0.0

and

kafka_server_ReplicaManager_Value{name="PartitionCount",} 2.0
kafka_server_ReplicaManager_Value{name="UnderReplicatedPartitions",} 1.0

Which means UnderReplicated is per broker, unlike with the test in #95.

Got the scrape times down to .2 seconds again, that's a consolation :)

I'll go ahead and explore more monitoring options. The addition of JMX_PORT,
e2ae2bf, is ok to merge I think.

@solsson
Copy link
Contributor Author

solsson commented Feb 2, 2018

#128 replaced this PR. With it you get for example kafka_server_replicamanager_value{name="UnderReplicatedPartitions"}.

@solsson solsson closed this Feb 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant