Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCost UI nginx timeouts tuning #40

Open
vova3379 opened this issue Oct 11, 2024 · 3 comments
Open

OpenCost UI nginx timeouts tuning #40

vova3379 opened this issue Oct 11, 2024 · 3 comments

Comments

@vova3379
Copy link

vova3379 commented Oct 11, 2024

Describe the bug

Opencost does not allow to tune values below
https://github.com/opencost/opencost-ui/blob/f243545ed8b04113d088f0e62e064acf5d950714/default.nginx.conf.template#L69-L71

        proxy_connect_timeout       180;
        proxy_send_timeout          180;
        proxy_read_timeout          180;

because we have clusters that can have more than 100 nodes time to time cluster cost count for 7 days in UI return

Failed to load report data
Request failed with status code 404

and in logs we see

opencost-bc6d766b9-fw2sg opencost-ui 2024/10/11 12:38:53 [error] 28#28: *296999 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.34.146.18, server: _, request: "GET /model/allocation/compute?window=7d&aggregate=namespace&includeIdle=true&step=1d&accumulate=false HTTP/1.1", upstream: "http://0.0.0.0:9003/allocation/compute?window=7d&aggregate=namespace&includeIdle=true&step=1d&accumulate=false", host: "prod-spok-opencost.domain.net", referrer: "https://prod-spok-opencost.domain.net/allocation?window=7d"
opencost-bc6d766b9-fw2sg opencost-ui 2024/10/11 12:38:53 [error] 28#28: *296999 open() "/var/www/custom_504.html" failed (2: No such file or directory), client: 10.34.146.18, server: _, request: "GET /model/allocation/compute?window=7d&aggregate=namespace&includeIdle=true&step=1d&accumulate=false HTTP/1.1", upstream: "http://0.0.0.0:9003/allocation/compute?window=7d&aggregate=namespace&includeIdle=true&step=1d&accumulate=false", host: "prod-spok-opencost.domain.net", referrer: "https://prod-spok-opencost.domain.net/allocation?window=7d"
opencost-bc6d766b9-fw2sg opencost-ui 10.34.146.18 - - [11/Oct/2024:12:38:53 +0000] "GET /model/allocation/compute?window=7d&aggregate=namespace&includeIdle=true&step=1d&accumulate=false HTTP/1.1" 404 153 "https://prod-spok-opencost.domain.net/allocation?window=7d" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:131.0) Gecko/20100101 Firefox/131.0" "10.32.104.179"

so in this case we hit 180s timeouts for opencost UI containers

opencost and prometheus pods do not get OOM and have CPU request settings (CPU limit = null) higher than metrics showing for them during 7-day calculation requests

So it seems like we need the ability to tune these parameters on the docker level and then on the chart level. In another case, we need to build our own docker container for opencost UI with updated timeout variables that is a good solution as I see.

Could be that I missed something, so suggestions are appreciated.

Monosnap Kubernetes Nodes (SPOK) - Grafana 2024-10-11 19-04-06
Monosnap Kubernetes Pod Stats - Grafana 2024-10-11 19-08-37
Monosnap Kubernetes Pod Stats - Grafana 2024-10-11 19-09-16
Monosnap Mozilla Firefox 2024-10-11 19-04-21

@AjayTripathy
Copy link
Contributor

Would support a PR here! Should be a quick handful of helm changes!

@vova3379
Copy link
Author

Can provide pr for this and after approval also PR for the opencost chart.

Is such productivity expected for opencost on such cluster sizes because kubecost doesn't have such productivity issues?

@AjayTripathy
Copy link
Contributor

Kubecost handles caching and data layers on the user's behalf between Prometheus and the UI; that is beyond the scope of opencost today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants