Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't see metrics for services on AKS #2182

Closed
grampelberg opened this issue Jan 31, 2019 · 2 comments
Closed

Can't see metrics for services on AKS #2182

grampelberg opened this issue Jan 31, 2019 · 2 comments

Comments

@grampelberg
Copy link
Contributor

What is the issue?

My services are running, but I don't see any metrics for them.

How can it be reproduced?

Leave linkerd running for awhile. Any changes to the cluster (new deployment for example) won't get metrics.

Logs, error output, etc

Everything's healthy.

linkerd check output

Check passes 100%.

Environment

  • Kubernetes Version: 1.11.5
  • Cluster Environment: AKS
  • Linkerd version: stable-2.1.0

Possible solution

It appears that watches become stale on AKS. You can fix this temporarily by restarting the linkerd control pods (effectively refreshing the state).

kubectl -n linkerd delete pod --all

From a code perspective, there might be an update that makes linkerd more resilient to these types of problems, kubernetes/kubernetes#67817. Unfortunately, that won't fix the prometheus side of things.

Additional context

There's a bunch of possible issues:

@grampelberg
Copy link
Contributor Author

Also possibly related, #1451.

olix0r added a commit to linkerd/linkerd2-proxy that referenced this issue Feb 2, 2019
In some network environments, peers may silently drop connections such
that the proxy cannot detect that the peer's socket has been closed.

The [TCP keepalive socket options][tcp-keepalive] configures the kernel
to actively probe connections to ensure connectivity and prevent idle
timeouts.

This change adds stack modules that attempt to configure accept and
connect sockets' TCP keepalive socket options. There are four new
environment configurations the proxy supports:

- `LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE`
- `LINKERD2_PROXY_OUTBOUND_ACCEPT_KEEPALIVE`
- `LINKERD2_PROXY_INBOUND_CONNECT_KEEPALIVE`
- `LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE`

When an environment variable is unset, no keepalive is set on the
corresponding sockets. Otherwise, its value is parsed as a duration.
OSes may or may not understand subsecond values.

It is recommended to only set the inbound-accept and outbound-connect
keepalive values, as keepalives shouldn'tbe necessary on localhost.

Relates to linkerd/linkerd2#1949 linkerd/linkerd2#2182

[tcp-keepalive]: http://www.tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
@olix0r
Copy link
Member

olix0r commented Feb 2, 2019

This is a duplicate of #1949

@olix0r olix0r closed this as completed Feb 2, 2019
olix0r added a commit to linkerd/linkerd2-proxy that referenced this issue Feb 4, 2019
* Add logging to proxy::tcp

* update rust version in dockerfile

* Introduce TCP keepalive configuration

In some network environments, peers may silently drop connections such
that the proxy cannot detect that the peer's socket has been closed.

The [TCP keepalive socket options][tcp-keepalive] configures the kernel
to actively probe connections to ensure connectivity and prevent idle
timeouts.

This change adds stack modules that attempt to configure accept and
connect sockets' TCP keepalive socket options. There are four new
environment configurations the proxy supports:

- `LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE`
- `LINKERD2_PROXY_OUTBOUND_ACCEPT_KEEPALIVE`
- `LINKERD2_PROXY_INBOUND_CONNECT_KEEPALIVE`
- `LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE`

When an environment variable is unset, no keepalive is set on the
corresponding sockets. Otherwise, its value is parsed as a duration.
OSes may or may not understand subsecond values.

It is recommended to only set the inbound-accept and outbound-connect
keepalive values, as keepalives shouldn'tbe necessary on localhost.

Relates to linkerd/linkerd2#1949 linkerd/linkerd2#2182

[tcp-keepalive]: http://www.tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

* Use smarter controller keepalives

For the controller's pods, it may not make sense to use the outbound
keepalive when commuciating with the proxy api, because this API may be
served on localhost.

If the controller's address is localhost/loopback, then use the
inbound connect keepalive instead.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants