-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded load-balancer behavior is flakey and hard to understand #11334
Labels
kind/enhancement
An improvement to existing functionality
Milestone
Comments
This was referenced Dec 6, 2024
Tests to cover
|
Validated above scenarios using k3s version v1.31.3+k3s-c88e217f
Terminated and then deleted a node
Added a new node
Agent logs
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The loadbalancer server list is a bit of a mess. its behavior has been tinkered with a lot over the last year, but it's still hard to reason about. This has caused a spate of issues:
From a code perspective, the loadbalancer state is directly accessed by a number of functions that all poke at various index vars, current and default server name vars, a list of server addresses, another RANDOM list of server addresses, and a map of addresses to structs that hold state:
k3s/pkg/agent/loadbalancer/loadbalancer.go
Lines 43 to 53 in cd4dded
The DialContext function is called whenever a new connection comes in, and holds a read lock while iterating (possibly twice) over the random server list, and servers may be added or removed at any time. The code is VERY hard to read and understand, given the number of variables involved:
k3s/pkg/agent/loadbalancer/loadbalancer.go
Lines 162 to 208 in cd4dded
We should simplify the load-balancer behavior so that it functions more reliably, and its functionality is easier to understand and explain.
The text was updated successfully, but these errors were encountered: