Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Add metrics #158

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Add metrics #158

wants to merge 13 commits into from

Conversation

JoshVanL
Copy link
Contributor

This PR metrics adds a number of metrics to the proxy.

The PR should be split up fairly well per commit.

It would be good to have a bit of insight into whether these metrics look sane... It is tricky not to make the number of time series explode here.

/assign @munnerz
/assign @simonswine
fixes #156

@jetstack-bot jetstack-bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Jun 18, 2020
@jetstack-bot jetstack-bot requested a review from simonswine June 18, 2020 23:35
@jetstack-bot jetstack-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 18, 2020

### kube_oidc_proxy_http_client_requests
counter - {http status code, path, remote address}
The number of requests for incoming requests.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The number of requests for incoming requests.
The number of incoming requests.


### kube_oidc_proxy_http_server_requests
counter - {http status code, path, remote address}
The requests for outgoing server requests.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The requests for outgoing server requests.
The number of outgoing server requests.

Comment on lines 48 to 49
"Adress to serving metrics on at the /metrics path. An empty address will "+
"disable serving metrics.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that this can't conflict with the other addresses here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the readiness probe is exposed as a port, but this is a full address. I think that makes sense despite the inconsistency.

cmd/app/run.go Outdated
return err
}
} else {
klog.Info("metrics listen address empty, disabling serving")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Info("metrics listen address empty, disabling serving")
klog.Info("metrics listen address empty, disabling serving metrics")

deploy/charts/kube-oidc-proxy/values.yaml Show resolved Hide resolved
http.Error(rw, "Impersonation requests are disabled when using kube-oidc-proxy", http.StatusForbidden)
return

// No name given or available in oidc request
case errNoName:
klog.V(2).Infof("no name available in oidc info %s", r.RemoteAddr)
statusCode = http.StatusForbidden
klog.V(2).Infof("no name available in oidc info %s", remoteAddr)
http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden)
http.Error(rw, "Username claim not available in OIDC Issuer response", statusCode)

http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden)
return

// No impersonation configuration found in context
case errNoImpersonationConfig:
klog.Errorf("if you are seeing this, there is likely a bug in the proxy (%s): %s", r.RemoteAddr, err)
statusCode = http.StatusInternalServerError
klog.Errorf("if you are seeing this, there is likely a bug in the proxy (%s): %s", remoteAddr, err)
http.Error(rw, "", http.StatusInternalServerError)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(rw, "", http.StatusInternalServerError)
http.Error(rw, "", statusCode)

http.Error(rw, "", http.StatusInternalServerError)
return

// Server or unknown error
default:
klog.Errorf("unknown error (%s): %s", r.RemoteAddr, err)
statusCode = http.StatusInternalServerError
klog.Errorf("unknown error (%s): %s", remoteAddr, err)
http.Error(rw, "", http.StatusInternalServerError)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(rw, "", http.StatusInternalServerError)
http.Error(rw, "", statusCode)

Comment on lines +196 to +199
var statusCode int
if resp != nil {
statusCode = resp.StatusCode
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will report statusCode as nil if there was an error in round-tripping? What will the client see in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client will see the error code that comes from the error handler if the there is an error.

If the response is nil then we have to report the status code as 0.

I actually made a mistake here that if there is an error, then we will be observing a client call twice (once here, and again in the error handler). I have now wrapped the client observation in an err != nil. The server still need to be observed here regardless.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some wrapper we could make the observation for both the error case (with the real error code) and the success case? I guess we lose access to some of the labels if we do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, the error handler code path isn't in most requests so we can't invoke it there, and indeed we loose some info.

@@ -0,0 +1,218 @@
// Copyright Jetstack Ltd. See LICENSE for details.
package metrics
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the prom client testing support? Is it possible to write tests that check that samples were recorded with certain labels set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but it is a bit of a string matching exercise. Happy to stick something in if you are keen...

https://github.com/jetstack/cert-manager/blob/master/pkg/metrics/certificates_test.go

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's critical, but I always prefer tests if they aren't too onerous.

@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoshVanL

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoshVanL
Copy link
Contributor Author

Thanks for the review @james-w :)
I've made the suggested changes or responded to you comments, and should be ready for another look
/unassign
/assign @james-w

@@ -43,6 +44,10 @@ func (k *KubeOIDCProxyOptions) AddFlags(fs *pflag.FlagSet) *KubeOIDCProxyOptions
fs.IntVarP(&k.ReadinessProbePort, "readiness-probe-port", "P", 8080,
"Port to expose readiness probe.")

fs.StringVar(&k.MetricsListenAddress, "metrics-serving-address", "0.0.0.0:80",
"Address to serve metrics on at the /metrics path. An empty address will "+
"disable serving metrics. Cannot use the same address as proxy or probe.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to explicitly set --metrics-serving-address="" without it being defaulted to 0.0.0.0:80?

return err
}
hooks.AddPreShutdownHook("Readiness Probe", readinessHandler.Shutdown)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are spaces in hook names okay/normal?

@@ -74,6 +82,9 @@ spec:
{{- range $key, $value := .Values.extraArgs -}}
- "--{{ $key }}={{ $value -}}"
{{ end }}
{{- if and .Values.metrics.enabled }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this and for? Should it be if .Values.metrics.enabled?

@@ -74,6 +82,9 @@ spec:
{{- range $key, $value := .Values.extraArgs -}}
- "--{{ $key }}={{ $value -}}"
{{ end }}
{{- if and .Values.metrics.enabled }}
- "--metrics-serving-address={{ .Values.metrics.address }}:{{ .Values.metrics.port }}"
{{ end }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we want this to be {{- end }} to avoid a trailing whitespace, but I see that others don't use it and it isn't really essential at all to this PR :)

annotations:
prometheus.io/path: /metrics
prometheus.io/port: "80"
prometheus.io/scrape: "true"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manifest does not have - "--metrics-serving-address={{ .Values.metrics.address }}:{{ .Values.metrics.port }}" in it, but does enable scraping of the endpoint. Is it being generated from the Helm chart?

klog.Infof("serving readiness probe on %s/ready", ln.Addr())

if err := h.Serve(ln); err != nil {
klog.Errorf("failed to serve readiness probe: %s", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we don't have any way to pass 'up' the stack if this is failing from what I can tell? Meaning if the readiness probe listener fails, the pod will never be restarted (assuming the liveness probe handler doesn't fail).

This isn't a change from what happened before from what I can see, but may be worth looking into as a follow up

// Auth request and handle unauthed
info, ok, err := p.oidcRequestAuther.AuthenticateRequest(req)
if err != nil {
// Since we have failed OIDC auth, we will try a token review, if enabled.
p.metrics.IncrementOIDCAuthCount(false, remoteAddr, "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specifically isn't connecting with OIDC, is there any way to avoid that one user flooding the metric with failure counts?

If I wanted to use token passthrough for one client, I'd end out with something that looks like an error but actually isn't really (from what I can tell).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a distinct 'number of passthroughs' metric? 🤷‍♂️


// Setup unauthed handler so that it is passed through the audit
unauthedHandler := audit.NewUnauthenticatedHandler(p.auditor, func(rw http.ResponseWriter, req *http.Request) {
_, remoteAddr := context.RemoteAddr(req)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit of a nit, but why does RemoteAddr return the request as well? Does it modify it in any way? 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see:


	clientAddress, ok := ctx.Value(clientAddressKey).(string)
	if !ok {
		clientAddress = xff.GetRemoteAddr(req)
		req = req.WithContext(request.WithValue(ctx, clientAddressKey, clientAddress))
	}

Not one for this PR, but that seems like non-obvious behaviour 😬 is xff.GetRemoteAddr(req) particularly expensive? What about just making this only accept the req, and then having a separate function WithRemoteAddress(ctx context.Context, remoteAddr string) context.Context or something? I don't quite see why we need to add that context back into the request, but perhaps I am missing something 😄 is this to allow passing through a header containing the connecting client's address?

@@ -199,8 +186,26 @@ func (p *Proxy) RoundTrip(req *http.Request) (*http.Response, error) {
// Set up impersonation request.
rt := transport.NewImpersonatingRoundTripper(*conf, p.clientTransport)

req, remoteAddr := context.RemoteAddr(req)
serverDuration := time.Now()
clientDuration := context.ClientRequestTimestamp(req)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might want to consider adopting the k8s.io/utils 'clock' package sometime soon - this will be tough to test otherwise 😄


// Start clock on metrics
tokenReviewDuration := time.Now()
req, remoteAddr := proxycontext.RemoteAddr(req)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm.. do we want to be overwriting req here to include this remote address?

@SimplySeth
Copy link

Pardon my ignorance, is this complete ?
How do I get access to the metrics if it is?
Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics endpoint
6 participants