Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Rocky9 kubernetes worker node, the haproxy external check script times out and doesn't start #1815

Open
aki-kamada opened this issue Sep 10, 2024 · 1 comment
Labels

Comments

@aki-kamada
Copy link

aki-kamada commented Sep 10, 2024

Report

I am unable to set up PerconaXtraDBCluster because the haproxy external check script times out.
I confirmed that this problem occurs on Kubernetes worker nodes of Rocky9. This problem doesn't occur on Kubernetes worker nodes of Ubuntu 20.04.
I seem to haproxy happen in busy loop. Because cpu resource is very increase.
Of course, when I run check_pxc.sh within the haproxy pod, I can confirm that it connects to the mysql cluster normally.

I found that this problem happen in Amazon EKS(amazon linux 2023 with kubernetes 1.29).
amazonlinux/amazon-linux-2023#693

Please let me know if we need more informations.

More about the problem

I attached haproxy log, cr.yaml, ps command log in haproxy container and manually run external-check-script log.
Explore-logs-2024-09-10 16_44_39.txt
ps-command-on-haproxy-pod.txt
test-cluster00-haproxy00.txt
manually-run-external-check-script.txt

Steps to reproduce

  1. setup pxc operator (I confirmed 3 version "1.13.0", "1.14.0" and "1.15.0")
  2. kubectl apply -f test-cluster00-haproxy00.yaml. (running pod on Rocky9 kubernetes worker node)

Versions

  1. Kubernetes 1.29.6 (problem occurs in Rocky9 worker node. doesn't problem occurs in Ubuntu 20.04 worker node)
  2. Operator 1.130, 1.14.0 and 1.15.0 (I confirmed 3 versions)
  3. Database 1.130, 1.14.0 and 1.15.0 (I confirmed 3 versions)

Anything else?

No response

@aki-kamada aki-kamada added the bug label Sep 10, 2024
@aki-kamada
Copy link
Author

I found that this issue occurs when the worker node is using a large amount of memory (over 300GB in my environment).
This issue has been confirmed to occur with haproxy and slapd.

Resolution:
If you change the value of LimitNOFILE in the containerd.service on the Kubernetes worker node from infinity to 65535, haproxy will start.

Thanks,
Aki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant