Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set externalTrafficPolicy as Local for agones-allocator #4019

Closed
osterante opened this issue Oct 17, 2024 · 5 comments · Fixed by #4022
Closed

Set externalTrafficPolicy as Local for agones-allocator #4019

osterante opened this issue Oct 17, 2024 · 5 comments · Fixed by #4022
Labels
kind/feature New features for Agones

Comments

@osterante
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When using two node pools in an Agones cluster, one for Agones and one for GameServers, allocation requests sometimes fail during the process of reducing nodes in the GameServers’ node pool, especially when reducing many nodes. I think it shouldn't be affected by changes in the GameServers’ node pool.

Describe the solution you'd like
Set the externalTrafficPolicy for the agones-allocator service to Local from Cluster(default).
https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer?hl=ja#health_check

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@gongmax
Copy link
Collaborator

gongmax commented Nov 5, 2024

Hi @osterante , thanks for the contribution. Just a few questions, can you explain more on the problem? what metrics did you observed and why you think it's related to the health check? Did you test the fix in #4022 and verify it mitigates the issue?

@osterante
Copy link
Contributor Author

osterante commented Nov 8, 2024

@gongmax
While reducing nodes, allocation requests to the agones-allocator (via gRPC) sometimes fail with the following error:

rpc error: code = Unavailable desc = error reading from server: EOF

The agones-allocator is the Kubernetes Service of type LoadBalancer. In this case, GKE creates Pass-through Load Balancer.
When externalTrafficPolicy is set to Cluster, allocation requests can be processed by any node, even if it does not contain
agones-allocator Pods and only contains GameServer Pods. Then, the node routes the packets to another node that contains an agones-allocator Pod. However, if the node terminates while routing the packets, the allocation request will fail with the error mentioned above.
If externalTrafficPolicy is set to Local, allocation requests are processed only by nodes that contains the agones-allocator Pod.

ref: https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer?hl=ja#node-packet-processing

@osterante
Copy link
Contributor Author

Hi @gongmax , do you have some time to review the PR? If there are any concerns, I can keep the default value unchanged for backward compatibility. What I want is to be able to change that value.

@gongmax
Copy link
Collaborator

gongmax commented Nov 19, 2024

"In most situations, the node routes the packet to a serving Pod running on the node which received the packet from the load balancer."

Is it possible that the node received the packet form the load balancer does not have the agones-allocator pod? How does the load balancer choose which node to send the request at the first place? I'm not quite familiar with the load balancer behavior so sorry for the dumb question

@osterante
Copy link
Contributor Author

Is it possible that the node received the packet form the load balancer does not have the agones-allocator pod?

I'm not sure but I don’t think that’s possible. This is because the following description is mentioned in the above section:

Use externalTrafficPolicy: Local to ensure that packets are only delivered to a node with at least one serving, ready, non-terminating Pod, preserving the original client source IP address. 

https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer#effect_of_externaltrafficpolicy

How does the load balancer choose which node to send the request at the first place?

The LB chooses one of the healthy nodes (I'm not sure if it’s by random selection or round robin). What a healthy node means depends on the value of externalTrafficPolicy. In the case of Cluster, it includes all nodes that pass the health check. In the case of Local, it includes nodes that contain at least one ready serving Pod.
https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer#health_check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants