Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add barebones Netty event-loop metrics #4750

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

vkostyukov
Copy link
Contributor

@vkostyukov vkostyukov commented Mar 11, 2023

Motivation:

We want to export some very basic metrics about the underlying Netty runtime. Let's start with the number of event-loops and the number of IO tasks waiting to be executed.

Modifications:

  • Added EventLoopMetrics, which implements MeterBinder.
  • The approach to count pending-io-tasks is highly inspired by this gauge from Finagle. It's also pretty similar how we do it internally at Databricks.
  • Registered CommonPools event-loop-groups for metrics collection.

Result:

@vkostyukov vkostyukov marked this pull request as ready for review March 11, 2023 04:23
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Thanks for your first contribution, @vkostyukov!

@vkostyukov vkostyukov requested review from trustin and ikhoon and removed request for jrhee17, minwoox, trustin and ikhoon March 13, 2023 17:57
@vkostyukov vkostyukov force-pushed the vk/event-loop-metrics branch 2 times, most recently from af936c2 to 3db0fc1 Compare March 18, 2023 00:17
@vkostyukov vkostyukov requested a review from ikhoon March 18, 2023 02:46
@vkostyukov
Copy link
Contributor Author

@ikhoon mind taking another look when you get a chance? Ty!

@vkostyukov vkostyukov force-pushed the vk/event-loop-metrics branch 2 times, most recently from 4ebf151 to 9cca856 Compare March 22, 2023 05:26
@vkostyukov
Copy link
Contributor Author

@ikhoon linter should be happy now. Not super sure about other failures - they seemed unrelated. I'm happy to dig more if you think otherwise.

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are on the same page. Left comments on code style and visibility.

@vkostyukov vkostyukov requested a review from ikhoon March 22, 2023 15:59
@minwoox minwoox added this to the 1.23.0 milestone Mar 23, 2023
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the solid work, @vkostyukov. Left some minor comments. 🙇

* - "event.loops.pending.tasks" (gauge) - the total number of IO tasks waiting to be run on event loops*
* </p>
*/
public static MeterBinder eventLoopMetrics(MeterIdPrefix idPrefix, EventLoopGroup eventLoopGroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about merging this class into MoreMeters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the first method there returning MeterBinder (or metrics in that sense).

MoreMeters seems to mostly contain utilities to work with MeterRegistry, which is also suggested by its javadoc:

// Provides utilities for accessing MeterRegistry.

I'd probably vote for keeping MoreMetrics as a factory for creating MeterBinders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it's better to merge those classes because it can be a bit distracting for users to find Micrometer relates utilities. We can fix the Javadoc of MoreMeters if that's the problem. 😄
Anyway, we still need to add @UnstableApi because this is a newly-added public method. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @UnstableApi annotation for eventLoopMetrics.

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @vkostyukov! ❤️
LGTM if @trustin comments are addressed.

@vkostyukov
Copy link
Contributor Author

@trustin Just updated. PTAL once you have a chance.

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left only style suggestions. Thanks! 😄

@vkostyukov
Copy link
Contributor Author

Left only style suggestions. Thanks! 😄

Thanks for looking. Just pushed an update. PTAL again

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks a lot for adding these nice metrics! 😄

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor #4750 (comment) but looks good to me! Thanks a lot @vkostyukov ! 🙇 👍 🚀

parent.gauge(pendingTasks, idPrefix.tags(), this, Self::pendingTasks);
}

// This runs every 3 seconds at most.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I wondered for a second whether it's worth caching these stats since:

  1. The computing logic doesn't seem too expensive and most likely will run on a separate thread
  2. The collector implementation has more transparency on when metrics are computed
  3. The current implementation doesn't guarantee stats are cached for concurrent requests anyways

Having said this, I think the current implementation is fine 👍

Copy link
Contributor Author

@vkostyukov vkostyukov Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I carried over this pattern from the Caffeine metrics, which are admittedly more expensive to collect. I'm happy to drop the caching if that also works for @ikhoon and @minwoox. I don't want to do it before we have a consensus.

Copy link
Member

@minwoox minwoox Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. @jrhee17 👍
I thought caching is okay because this type of information does not need to be immediately reflected in actual values.
Of course, we can revisit this later if we need to fix it. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd down to simplify it. I think pendingTask is pretty cheap to run. Let me make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Thanks!

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vkostyukov ! Looks really nice 🚀 👍 🙇

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @vkostyukov!

@trustin
Copy link
Member

trustin commented Mar 28, 2023

Let me make the following changes and merge:

  • Rename MoreMetrics to MoreMeterBinders to make it clear that it provides MeterBinders.
  • Reorder the parameters of .eventLoopMetrics() and EventLoopMetrics() so that name comes after EventLoopGroup, so it's consistent with the official binder implementations such as ExecutorServiceMetrics.
  • Fix a Checkstyle violation.

@trustin trustin merged commit 2dccda3 into line:main Mar 28, 2023
* exported per registered {@link MeterIdPrefix}.
*
* <ul>
* <li>"event.loop.num.workers" (gauge) - the total number of Netty's event loops</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Let me remove num in the release note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics about EventLoop
5 participants