Embedded load-balancer behavior is flakey and hard to understand #11334

brandond · 2024-11-17T21:24:47Z

The loadbalancer server list is a bit of a mess. its behavior has been tinkered with a lot over the last year, but it's still hard to reason about. This has caused a spate of issues:

Etcd load-balancer on control-plane-only server may select removed etcd cluster member as default server #9882
Stopping service on ControlPlane node causes other nodes to go NotReady #9910
Health checks may remove all available servers from load-balancer under high load #10240
Loadbalancer may panic due to race condition when selecting a new server #10317
Secondary etcd-only nodes do not reconnect to apiserver after outage if joined against an etcd-only node #11311
etcd-only and agent nodes do not properly fail back during apiserver outage #11349

From a code perspective, the loadbalancer state is directly accessed by a number of functions that all poke at various index vars, current and default server name vars, a list of server addresses, another RANDOM list of server addresses, and a map of addresses to structs that hold state:

k3s/pkg/agent/loadbalancer/loadbalancer.go

Lines 43 to 53 in cd4dded

    
           serviceName          string 
        
           configFile           string 
        
           localAddress         string 
        
           localServerURL       string 
        
           defaultServerAddress string 
        
           ServerURL            string 
        
           ServerAddresses      []string 
        
           randomServers        []string 
        
           servers              map[string]*server 
        
           currentServerAddress string 
        
           nextServerIndex      int

The DialContext function is called whenever a new connection comes in, and holds a read lock while iterating (possibly twice) over the random server list, and servers may be added or removed at any time. The code is VERY hard to read and understand, given the number of variables involved:

k3s/pkg/agent/loadbalancer/loadbalancer.go

Lines 162 to 208 in cd4dded

    
           var allChecksFailed bool 
        
           startIndex := lb.nextServerIndex 
        
           for { 
        
           	targetServer := lb.currentServerAddress 
        
           	server := lb.servers[targetServer] 
        
           	if server == nil || targetServer == "" { 
        
           		logrus.Debugf("Nil server for load balancer %s: %s", lb.serviceName, targetServer) 
        
           	} else if allChecksFailed || server.healthCheck() { 
        
           		dialTime := time.Now() 
        
           		conn, err := server.dialContext(ctx, network, targetServer) 
        
           		if err == nil { 
        
           			return conn, nil 
        
           		} 
        
           		logrus.Debugf("Dial error from load balancer %s after %s: %s", lb.serviceName, time.Now().Sub(dialTime), err) 
        
           		// Don't close connections to the failed server if we're retrying with health checks ignored. 
        
           		// We don't want to disrupt active connections if it is unlikely they will have anywhere to go. 
        
           		if !allChecksFailed { 
        
           			defer server.closeAll() 
        
           		} 
        
           	} else { 
        
           		logrus.Debugf("Dial health check failed for %s", targetServer) 
        
           	} 
        
           	newServer, err := lb.nextServer(targetServer) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	if targetServer != newServer { 
        
           		logrus.Debugf("Failed over to new server for load balancer %s: %s -> %s", lb.serviceName, targetServer, newServer) 
        
           	} 
        
           	if ctx.Err() != nil { 
        
           		return nil, ctx.Err() 
        
           	} 
        
           	maxIndex := len(lb.randomServers) 
        
           	if startIndex > maxIndex { 
        
           		startIndex = maxIndex 
        
           	} 
        
           	if lb.nextServerIndex == startIndex { 
        
           		if allChecksFailed { 
        
           			return nil, errors.New("all servers failed") 
        
           		} 
        
           		logrus.Debugf("Health checks for all servers in load balancer %s have failed: retrying with health checks ignored", lb.serviceName) 
        
           		allChecksFailed = true 
        
           	} 
        
           }

We should simplify the load-balancer behavior so that it functions more reliably, and its functionality is easier to understand and explain.

ShylajaDevadiga · 2024-12-11T19:09:08Z

Tests to cover

Create cluster with 3etcd, 2cp and 1 agent node. Validate lb, ingress functionality
Restart k3s on all nodes
Stop and start one node
Restart CP nodes in reverse order
Reboot all nodes
Delete one node, validate functionality
Delete node and add new node, validate functionality
Stop 2 nodes and check logs in the agent node

ShylajaDevadiga · 2024-12-13T01:34:10Z

Validated above scenarios using k3s version v1.31.3+k3s-c88e217f

k3s -v
k3s version v1.31.3+k3s-c88e217f (c88e217f)
go version go1.22.8

> kubectl get node
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-12-199.us-east-2.compute.internal   Ready    control-plane,etcd,master   16h   v1.31.3+k3s-c88e217f
ip-172-31-13-86.us-east-2.compute.internal    Ready    control-plane,etcd,master   16h   v1.31.3+k3s-c88e217f
ip-172-31-15-89.us-east-2.compute.internal    Ready    <none>                      16h   v1.31.3+k3s-c88e217f
ip-172-31-5-210.us-east-2.compute.internal    Ready    control-plane,etcd,master   16h   v1.31.3+k3s-c88e217f

Terminated and then deleted a node

> kubectl get nodes
NAME                                          STATUS     ROLES                       AGE   VERSION
ip-172-31-12-199.us-east-2.compute.internal   Ready      control-plane,etcd,master   18h   v1.31.3+k3s-c88e217f
ip-172-31-13-86.us-east-2.compute.internal    Ready      control-plane,etcd,master   18h   v1.31.3+k3s-c88e217f
ip-172-31-15-89.us-east-2.compute.internal    Ready      <none>                      18h   v1.31.3+k3s-c88e217f
ip-172-31-5-210.us-east-2.compute.internal    NotReady   control-plane,etcd,master   18h   v1.31.3+k3s-c88e217f

> kubectl delete node ip-172-31-5-210.us-east-2.compute.internal
node "ip-172-31-5-210.us-east-2.compute.internal" deleted
ec2-user@ip-172-31-12-199:~> kubectl get nodes
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-12-199.us-east-2.compute.internal   Ready    control-plane,etcd,master   23h   v1.31.3+k3s-c88e217f
ip-172-31-13-86.us-east-2.compute.internal    Ready    control-plane,etcd,master   23h   v1.31.3+k3s-c88e217f
ip-172-31-15-89.us-east-2.compute.internal    Ready    <none>                      23h   v1.31.3+k3s-c88e217f

Added a new node

> kubectl get nodes
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-12-199.us-east-2.compute.internal   Ready    control-plane,etcd,master   23h   v1.31.3+k3s-c88e217f
ip-172-31-13-86.us-east-2.compute.internal    Ready    control-plane,etcd,master   23h   v1.31.3+k3s-c88e217f
ip-172-31-15-89.us-east-2.compute.internal    Ready    <none>                      23h   v1.31.3+k3s-c88e217f
ip-172-31-3-142.us-east-2.compute.internal    Ready    control-plane,etcd,master                        59m    v1.31.3+k3s-c88e217f

Agent logs

Dec 13 00:59:23 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:23Z" level=info msg="Updated load balancer k3s-agent-load-balancer server addresses -> [1.1.1.8:6443 2.2.2.215:6443 3.3.3.44:6443] [default: 1.1.1.8:6443]"
Dec 13 00:59:23 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:23Z" level=info msg="Stopped tunnel to 3.137.211.32:6443"
Dec 13 00:59:40 ip-172-31-15-89 k3s[1585]: I1213 00:59:40.959724    1585 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.2.0/24]
Dec 13 00:59:40 ip-172-31-15-89 k3s[1585]: I1213 00:59:40.960365    1585 subnet.go:152] Batch elem [0] is { lease.Event{Type:1, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac1f038e, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x38, 0x61, 0x3a, 0x36, 0x63, 0x3a, 0x37, 0x37, 0x3a, 0x31, 0x33, 0x3a, 0x65, 0x34, 0x3a, 0x62, 0x39, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
Dec 13 00:59:40 ip-172-31-15-89 k3s[1585]: I1213 00:59:40.960452    1585 vxlan_network.go:100] Received Subnet Event with VxLan: BackendType: vxlan, PublicIP: 172.31.3.142, PublicIPv6: (nil), BackendData: {"VNI":1,"VtepMAC":"8a:6c:77:13:e4:b9"}, BackendV6Data: (nil)
Dec 13 00:59:50 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:50Z" level=info msg="Removing server from load balancer k3s-agent-load-balancer: 3.3.3.44:6443"
Dec 13 00:59:50 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:50Z" level=info msg="Updated load balancer k3s-agent-load-balancer server addresses -> [1.1.1.8:6443 2.2.2.215:6443] [default: 1.1.1.8:6443]"
Dec 13 00:59:50 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:50Z" level=info msg="Stopped tunnel to 3.3.3.44:6443"
Dec 13 00:59:50 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:50Z" level=info msg="Proxy done" err="context canceled" url="wss://3.3.3.44:6443/v1-k3s/connect"
Dec 13 00:59:55 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:55Z" level=info msg="Adding server to load balancer k3s-agent-load-balancer: 4.4.4.32:6443"
Dec 13 00:59:55 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:55Z" level=info msg="Updated load balancer k3s-agent-load-balancer server addresses -> [1.1.1.8:6443 2.2.2.215:6443 4.4.4.32:6443] [default: 1.1.1.8:6443]"
Dec 13 00:59:55 ip-172-31-15-89 k3s[1585]: time="2024-12-13T00:59:55Z" level=info msg="Started tunnel to 4.4.4.32:6443"

github-project-automation bot added this to K3s Development Nov 17, 2024

github-project-automation bot moved this to New in K3s Development Nov 17, 2024

brandond self-assigned this Nov 17, 2024

brandond moved this from New to Working in K3s Development Nov 17, 2024

brandond added this to the 2024-12 Release Cycle milestone Nov 17, 2024

brandond added the kind/enhancement An improvement to existing functionality label Nov 17, 2024

brandond mentioned this issue Nov 17, 2024

Rework loadbalancer server selection logic #11329

Merged

brandond moved this from Working to Peer Review in K3s Development Nov 22, 2024

VestigeJ assigned ShylajaDevadiga Dec 9, 2024

ShylajaDevadiga moved this from Peer Review to To Test in K3s Development Dec 11, 2024

This was referenced Dec 11, 2024

[Release-1.31] - Update k3s for loadbalancer improvements rancher/rke2#7394

Closed

[Release-1.30] - Update k3s for loadbalancer improvements rancher/rke2#7395

Closed

[Release-1.29] - Update k3s for loadbalancer improvements rancher/rke2#7396

Closed

ShylajaDevadiga closed this as completed Dec 13, 2024

github-project-automation bot moved this from To Test to Done Issue in K3s Development Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedded load-balancer behavior is flakey and hard to understand #11334

Embedded load-balancer behavior is flakey and hard to understand #11334

brandond commented Nov 17, 2024 •

edited

Loading

ShylajaDevadiga commented Dec 11, 2024 •

edited

Loading

ShylajaDevadiga commented Dec 13, 2024

Embedded load-balancer behavior is flakey and hard to understand #11334

Embedded load-balancer behavior is flakey and hard to understand #11334

Comments

brandond commented Nov 17, 2024 • edited Loading

ShylajaDevadiga commented Dec 11, 2024 • edited Loading

ShylajaDevadiga commented Dec 13, 2024

brandond commented Nov 17, 2024 •

edited

Loading

ShylajaDevadiga commented Dec 11, 2024 •

edited

Loading