-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't see metrics for services on AKS #2182
Labels
Comments
Also possibly related, #1451. |
olix0r
added a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Feb 2, 2019
In some network environments, peers may silently drop connections such that the proxy cannot detect that the peer's socket has been closed. The [TCP keepalive socket options][tcp-keepalive] configures the kernel to actively probe connections to ensure connectivity and prevent idle timeouts. This change adds stack modules that attempt to configure accept and connect sockets' TCP keepalive socket options. There are four new environment configurations the proxy supports: - `LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE` - `LINKERD2_PROXY_OUTBOUND_ACCEPT_KEEPALIVE` - `LINKERD2_PROXY_INBOUND_CONNECT_KEEPALIVE` - `LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE` When an environment variable is unset, no keepalive is set on the corresponding sockets. Otherwise, its value is parsed as a duration. OSes may or may not understand subsecond values. It is recommended to only set the inbound-accept and outbound-connect keepalive values, as keepalives shouldn'tbe necessary on localhost. Relates to linkerd/linkerd2#1949 linkerd/linkerd2#2182 [tcp-keepalive]: http://www.tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
This is a duplicate of #1949 |
olix0r
added a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Feb 4, 2019
* Add logging to proxy::tcp * update rust version in dockerfile * Introduce TCP keepalive configuration In some network environments, peers may silently drop connections such that the proxy cannot detect that the peer's socket has been closed. The [TCP keepalive socket options][tcp-keepalive] configures the kernel to actively probe connections to ensure connectivity and prevent idle timeouts. This change adds stack modules that attempt to configure accept and connect sockets' TCP keepalive socket options. There are four new environment configurations the proxy supports: - `LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE` - `LINKERD2_PROXY_OUTBOUND_ACCEPT_KEEPALIVE` - `LINKERD2_PROXY_INBOUND_CONNECT_KEEPALIVE` - `LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE` When an environment variable is unset, no keepalive is set on the corresponding sockets. Otherwise, its value is parsed as a duration. OSes may or may not understand subsecond values. It is recommended to only set the inbound-accept and outbound-connect keepalive values, as keepalives shouldn'tbe necessary on localhost. Relates to linkerd/linkerd2#1949 linkerd/linkerd2#2182 [tcp-keepalive]: http://www.tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html * Use smarter controller keepalives For the controller's pods, it may not make sense to use the outbound keepalive when commuciating with the proxy api, because this API may be served on localhost. If the controller's address is localhost/loopback, then use the inbound connect keepalive instead.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
What is the issue?
My services are running, but I don't see any metrics for them.
How can it be reproduced?
Leave linkerd running for awhile. Any changes to the cluster (new deployment for example) won't get metrics.
Logs, error output, etc
Everything's healthy.
linkerd check
outputCheck passes 100%.
Environment
Possible solution
It appears that watches become stale on AKS. You can fix this temporarily by restarting the linkerd control pods (effectively refreshing the state).
From a code perspective, there might be an update that makes linkerd more resilient to these types of problems, kubernetes/kubernetes#67817. Unfortunately, that won't fix the prometheus side of things.
Additional context
There's a bunch of possible issues:
The text was updated successfully, but these errors were encountered: