-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to wait for object to sync in-cache after patching context deadline exceeded #1017
Comments
This means the controller stopped receiving data from Kubernetes API, I suspect your Kubernetes controler plane is having issues. |
We are having the same problem, but also, the Probably the Liveness probe should still work even if there are problems contacting the control plane |
Not if you build your controller with Kubernetes controller-runtime. Having the controller running and DDOSing the API endpoint would do you no good, kubelet will restart to controller with an exponential backoff which prevents the API server from being overloaded once it starts. |
We downgraded the control-plane (GKE rapid channel) and now everything seems to be fine again. I still haven't really found the root cause, but my point was that if the controller is behaving properly, but the k8s API is overloaded or unresponsive for some other reason than the controller, the liveness probe on the controller should still pass the checks, right? |
Not if the CNI is failing, kubelet can't reach the port. There is nothing special about the liveness probe, it's the standard controller-runtime ping handler https://github.com/fluxcd/pkg/blob/ac1007b57e37838e73b8bc95365dab9a0e856e8e/runtime/probes/probes.go#L45 |
That it's not the case as there are several other applications running in the same cluster (and same node as flux controllers ) and non of them have any problems, neither communicating to the internet nor among each other. Also, the liveness port of the flux controllers is reachable, but it just doesn't respond. What I think is happening, is that the problematic version of the control plane has changed something related to rate limiting of API queries and that is only affecting flux because in our case it's the app that queries the k8s API the most. I'm pretty sure we can reproduce the issue easily by switching the control plane back to the problematic version if you are willing to debug this together. |
@fcuello-fudo if Flux runs into rate limits there must be error logs, if you can post those would be helpful. We use the Kubernetes |
wut?
really, what does it mean?
why there's no other logs that describe what's going on?
The text was updated successfully, but these errors were encountered: