Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

marathon-lb reload bug #602

Open
Sisphyus opened this issue Oct 15, 2018 · 0 comments
Open

marathon-lb reload bug #602

Sisphyus opened this issue Oct 15, 2018 · 0 comments

Comments

@Sisphyus
Copy link

Sisphyus commented Oct 15, 2018

Last week when we update a core service in our production environment(build with DC/OS). we accidentally make a mistake when change the health check configuration. and we get 503 return all the time from external access until we make health check configuration correctly and restart service . the old instance state is always healthy in marathon page. so we think something happened when marathon-lb reload.

why old healthy instance lose efficacy after we make a bad health check ?As we know nothing changed with old healthy instance when we lunch a new unhealthy instance in same application.

Test and Verification(marathon-lb version 1.12.1)

  1. a new nginx(listen 80) test application lunched(health check port 80)
  2. change health check port to 81 (marathon lunch a new instance and its state will never be healthy, at this time the nginx backend in haproxy.cfg has two different server)
  3. test external access

haproxy.cfg

before reload

backend nginx-lbl-test_10278
  balance roundrobin
  mode http
  option forwardfor
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 80

after reload

backend nginx-lbl-test_10278
  balance roundrobin
  mode http
  option forwardfor
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 81
  server 10_168_0_82_9_0_5_12_80 9.0.5.12:80 check inter 5s fall 4 port 81

so why old instance health check configuration also has been updated?

It's terrible when we update some application in production environment. haproxy failover lose efficacy when you make a bad health check even the old healthy instance is still alive.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant