An endpoint to determine whether a node is a raft leader #2582

nenkoru · 2024-12-09T22:28:53Z

Hello there!

As per thread on the forums[1] I think it might be worth implementing what Hashicorp Vault does with its /sys/health endpoint[2] to return 200 status code if the node is a raft leader.
This would allow for auto-failover when using HAProxy L7 load-balancer to a next node that has become a leader of the raft.

I have gone ahead and hacked together a PR #2581 which adds another webapi component to the controller which allows for determining whether a controller is a raft leader.
However I need a response whether this idea is viable enough to be even considered to be merged upstream.

[1] https://openziti.discourse.group/t/ziti-controller-ha-setup-behind-a-haproxy-load-balancer/3535
[2] https://developer.hashicorp.com/vault/api-docs/system/health

The text was updated successfully, but these errors were encountered:

nenkoru · 2024-12-10T14:10:20Z

Tested it with the following HAproxy config. It works nicely and as intended. HAProxy automatically detects a backend node to direct traffic to using this /sys/health endpoint.
Tested by manually shutting down leader node -> after a few seconds traffic starts to be directed to a newly elected leader.

global
  # Global Configuration

defaults
  timeout connect 5000
  timeout client 50000
  timeout server 50000

frontend main
  mode tcp
  bind *:443
  use_backend ctrl

backend ctrl
  mode tcp
  option httpchk GET /sys/health
  http-check expect status 200
  server ctrl1 127.0.0.1:1281 check check-ssl verify none
  server ctrl2 127.0.0.1:1282 check check-ssl verify none
  server ctrl3 127.0.0.1:1283 check check-ssl verify none

plorenz · 2024-12-10T20:59:42Z

Overall, seems like a useful thing to have. The only thing to note is that there is an existing health check implementation in controller/health.go.

My preference would be to consolidate this functionality in with that code and distinguish between overall health and raft later status with a parameter or path.

Thoughts?

nenkoru · 2024-12-10T22:18:25Z

Overall, seems like a useful thing to have. The only thing to note is that there is an existing health check implementation in controller/health.go.

My preference would be to consolidate this functionality in with that code and distinguish between overall health and raft later status with a parameter or path.

Thoughts?

I guess this change would be breaking backward compatibility with this 429 status code being returned if the node isn't the leader. If backward compatibility not to be the obstacle I think its safe to migrate this logic directly into health-check handler.
This health-check can't be used for routers, correct?

nenkoru · 2024-12-11T14:56:15Z

Apart from #2586 issue, it suprisingly works well.
Not only this allows for auto faillover in case of a disaster, but it also allows for the ziti desktop edge(for mac at least) to do not even notice there was a disaster with the controller it was using!

nenkoru mentioned this issue Dec 9, 2024

Add controller-isleader xweb component to respond with 200 if the controller in question is a raft leader #2581

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An endpoint to determine whether a node is a raft leader #2582

An endpoint to determine whether a node is a raft leader #2582

nenkoru commented Dec 9, 2024 •

edited

Loading

nenkoru commented Dec 10, 2024

plorenz commented Dec 10, 2024

nenkoru commented Dec 10, 2024 •

edited

Loading

nenkoru commented Dec 11, 2024

An endpoint to determine whether a node is a raft leader #2582

An endpoint to determine whether a node is a raft leader #2582

Comments

nenkoru commented Dec 9, 2024 • edited Loading

nenkoru commented Dec 10, 2024

plorenz commented Dec 10, 2024

nenkoru commented Dec 10, 2024 • edited Loading

nenkoru commented Dec 11, 2024

nenkoru commented Dec 9, 2024 •

edited

Loading

nenkoru commented Dec 10, 2024 •

edited

Loading