-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dispatch the metrics via telemetry #503
Comments
Hi @slashmili It'd have to be configurable though. |
@zmstone configureable to disable it completely? Ok fair enough. Do you prefer it to be done via Macros or via a condition that only runs during runtime? I'm asking since if Macro is they way to go, I need to do some study first. |
Hi @slashmili
|
Telemetry in brod would be a big win 👍 💯 |
I'd love to help on this wherever I can. I think librdkafka's STATISTICS page could be helpful when trying to figure out exactly what metrics should be emitted. |
@zmstone regarding
Does this actually need to be the case? I thought by nature telemetry uses dynamic dispatching to delegate the execution of the handler or "work" process to the the subscriber so the framework has an "optional" built in. If you subscribe to the events then you "opt in" and some overhead is added, if you do not subscribe, then there is no handler for the given event and there is no overhead added. It looks like the only overhead added to brod would be an ets lookup to see if a handler for an event exists -- This seems like a very small cost to pay in favor of the added complexity of an optional dependency and macro.
I don't see telemetry as an optional dependency for any large packages like ecto_sql which can have an extremely high throughput. |
I’m thinking about it more from dependency hygiene perspective but not concerned about performance. Given it’s essentially two apis to integrate, I assumed it’s not a big effort to implement. Btw, the compression libs are also made optional (at runtime). One org may use beam-telemetry, another may have a different. |
Hi @zmstone ! I think the purpouse of the beam-telemetry is to be agnostic of the handling details. It is just a dispatch library that is maintained by the community and endorsed by the observability working group of the ErlEF https://github.com/erlef/observability-wg . I think that it's goal is exactly to be this standard glueing layer of telemetry. It seems to be aligned with your goal of dependency hygiene (if no handlers are provided, it is almost a pass-through call with no side-effect) and to allow consumers to choose which instrumentation library. From the ones I know it seems they all integrate nicely with telemetry. Compression is a different case as there is no standard endorsed by the ErlEF currently. What do you think? |
fair enough, So a wrapping project still has the chance to change to other calls without going though beam-telemetry |
TBH I don't mind wrapping around a -module(brod_telemetry).
-export([execute/2,
execute/3,
span/3]).
execute(EventName, Measurements, Metadata) -> telemetry:execute(EventName, Measurements, Metadata).
execute(EventName, Measurements) -> telemetry:execute(EventName, Measurements).
span(EventPrefix, StartMetadata, SpanFunction) -> telemetry:span(EventPrefix, StartMetadata, SpanFunction). What I've been struggling to do are :
let me know if you think the first PR should cover telemetry in any other module.
We can use telemetry in two ways: While do_connect(Endpoint, State) ->
ConnConfig = conn_config(State),
kpro:connect(Endpoint, ConnConfig). TO: do_connect(Endpoint, #state{client_id = ClientId} = State) ->
StartMetadata = #{client_id => ClientId},
brod_telemetry:span(
[brod, client, connect],
StartMetadata,
fun() ->
ConnConfig = conn_config(State),
Result = kpro:connect(Endpoint, ConnConfig),
Metadata = maps:merge(StartMetadata, connect_result_to_metadata(Result)),
{Result, Metadata}
end
).
connect_result_to_metadata({ok, _}) -> #{status => ok, reason => nil};
connect_result_to_metadata({error, Reason}) -> #{status => error, reason => Reason}. It's easier to use span since it does keep track of time and also triggers:
join_group(#state{ groupId = GroupId
, client = ClientId
} = State0) ->
StartMetadata = #{client_id => ClientId, group_id => GroupId} ,
brod_telemetry:span(
[brod, group_coordinator, join_group],
StartMetadata,
fun () ->
Result = do_join_group(State0),
{Result, maps:merge(StartMetadata, join_group_to_metadata(Result))}
end).
join_group_to_metadata({ok, #state{
memberId = MemberId
, leaderId = LeaderId
, generationId = GenerationId
, members = Members
}}) ->
#{
member_id => MemberId
, leader_id => LeaderId
, generation_id => GenerationId
, members => Members}.
do_join_group.... The code becomes too much noisy that I know I'll hate my PR already 😭 What do you think? Is this kind of changes are inevitable ? or do you have an other suggestions? |
Just looking at ecto/db_connection suite as a reference — their approach seems to be to time the relevant parts and emit the timed event with execute. sql.ex#L1101-L1103 |
Thanks for the pointer! I think it was kinda easier choice for Ecto since it already has the data for logging. I also looked at: Ok so it's not uncommon to prevent messing with the code run I'll take that approach then |
This would be a very welcomed feature and would also help integrate open telemetry tracing similar to how they do the oban integration. |
@slashmili let me know what you think of #512. Just starting to take a very light stab at adding telemetry. |
On the topic of telemetry (the general term, not the library) there is also a need for propagating distributed trace context in kafka messages and instrumentation with So if adding some abstraction that maybe calls If a non- The way OpenTelemetry tries to make integration zero-cost for a user who doesn't use it is the separation of the API and SDK. Including only And just in case, the Otel semantic conventions for Kafka may be of use https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/instrumentation/kafka.md |
Thanks you all for the ideas. @tsloughter I only had a quick look at |
Hi again @tsloughter |
@zmstone compile time flag can be nice for some, but being able to switch between no-op and active without recompiling is useful/easier for some. And yes, the opentelemetry_api macros will lookup the tracer to use with persistent terms and get the no-op tracer in the cases that there is no SDK installed. There isn't a reason to not include |
Any updates on this? It would be very valuable. |
Just recently I picked |
Hello!
I'm wondering if you are open to a contribution to introduce telemetry into this project?
There lots of important metric that I'd like to tap into especially when consumer connect and rebalance and would be a great addition to this library.
The text was updated successfully, but these errors were encountered: