Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus metrics #431

Closed
wants to merge 4 commits into from
Closed

Add prometheus metrics #431

wants to merge 4 commits into from

Conversation

JessicaGreben
Copy link
Contributor

@JessicaGreben JessicaGreben commented Oct 12, 2021

This PR is for issue #378 and adds the beginning scaffolding to export metrics for Prometheus. Ultimately we will fully migrate over to Prometheus instead of using the current Statsd metrics, but in the meantime we will need to support both.

The baseplate spec specifies that an active_requests metrics must be (ref: spec) emitted so this PR adds that as the first custom metric.

The /metrics endpoint has been added to the golang baseplate code via the baseplate-cookiecutter repo in this PR.

The next work that will be done in a different PR is to add the scaffolding that allows applications to generate custom metrics for their own purposes.

@JessicaGreben JessicaGreben requested a review from a team as a code owner October 12, 2021 22:41
@JessicaGreben JessicaGreben requested review from fishy and kylelemons and removed request for a team October 12, 2021 22:41
Copy link
Contributor

@kylelemons kylelemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the plan was to use promauto? This feels like a lot of machinery to me

metricsbp/prometheus.go Outdated Show resolved Hide resolved
Comment on lines +10 to +13
var activeRequests = promauto.NewGauge(prometheus.GaugeOpts{
Name: "active_requests",
Help: "The number of requests being handled by the service.",
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in statsd this is a "runtime gauge" instead of a plain gauge. a "runtime gauge" comes with two tags, the hostname and the pid, and auto add "runtime." prefix.

then we have some special handling on telegraf on everything under runtime. prefix: we strip the 2 special tags, and calculate the average/max/min/etc. from all the pods reporting this gauge. this is kind of like treating it as a histogram instead of a gauge.

we probably should have an internal spec discussion on how do we want to handle runtime gauges on prometheus first.

Copy link

@bjk-reddit bjk-reddit Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really exist the same way in Prometheus. This is because Prometheus always includes the instance of the monitored target in every metric. For Go, there is only ever one process, so effectively every metric includes those tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part of the spec needs to be corrected for Prometheus.

Copy link
Contributor

@kylelemons kylelemons Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the PID part is only for einhorn-esque things

@@ -63,7 +63,8 @@ type Config struct {
// with the global tracing hook registry.
func InitFromConfig(ctx context.Context, cfg Config) io.Closer {
M = NewStatsd(ctx, cfg)
tracing.RegisterCreateServerSpanHooks(CreateServerSpanHook{Metrics: M})
pm := NewPrometheusMetrics(ctx, cfg)
tracing.RegisterCreateServerSpanHooks(CreateServerSpanHook{Metrics: M, PrometheusMetrics: pm})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a style nit, I feel that the only appropriate case of writing a whole struct in one line is when you only fill zero or one of its fields. anything >1 should be write in one-per-line instead:

Suggested change
tracing.RegisterCreateServerSpanHooks(CreateServerSpanHook{Metrics: M, PrometheusMetrics: pm})
tracing.RegisterCreateServerSpanHooks(CreateServerSpanHook{
Metrics: M,
PrometheusMetrics: pm,
})

)

var activeRequests = promauto.NewGauge(prometheus.GaugeOpts{
Name: "active_requests",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to a protocol? Typically for this metric we would prefix it with the protocol.

This can be done with the Namespace. Something like this:

prometheus.GaugeOpts{
	Namespace: "http",
	Name: "active_requests",
...
}

Copy link
Contributor Author

@JessicaGreben JessicaGreben Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of adding a label with transport type like stated here just until the baseplate.go remodel next year when tracing and metrics are decoupled. For simplicity, I was going to leave this as is, but I can look into whats involved to move over into the specific protocol metrics that I'm adding shortly if thats better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are doing per-transport active_requests then the server span hook is no longer the correct place to do it, because server span hook runs on all servers and has no knowledge which transport it is in. this needs to be done on the server middleware instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like the best thing to do is move this to the per transport server middleware. will make that change.

@@ -35,21 +38,23 @@ func (h CreateServerSpanHook) OnCreateServerSpan(span *tracing.Span) error {
// ends, with success=True/False tags for the counter based on whether an error
// was passed to `span.End` or not.
type spanHook struct {
metrics *Statsd
metrics *Statsd
prometheusMetrics *PrometheusMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen, the "best practice" for injecting prometheus metrics (especially when there's just one, like "active_requests" is to pass in the prometheus.Counter -- that also could allow you to do the "active_requests" per-transport here potentially, using prometheus.CounterVec.With to pre-populate the label.

Leaving this out to just do in the per-transport section also sounds fine if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the details. sounds like i will close this PR and add active requests to the per transport PRs.

@JessicaGreben
Copy link
Contributor Author

Closing this to add active_request to the per transport server middleware instead.

@JessicaGreben JessicaGreben deleted the sre-1153-add-prometheus branch October 15, 2021 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants