Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(flamegraph): Lower number of concurrent reads #537

Merged
merged 3 commits into from
Dec 2, 2024

Conversation

Zylphrex
Copy link
Member

The new flamegraph endpoint is very likely the suspect of the OOMs we've been seeing. Upon examining the old flamegraph endpoint, we used to spawn ~25 go routines per request but at a lower throughput. So having 100 concurrent workers may be the cause of the OOMs as we try to read too many profiles at once.

The new flamegraph endpoint is very likely the suspect of the OOMs we've been
seeing. Upon examining the old flamegraph endpoint, we used to spawn ~25 go
routines per request but at a lower throughput. So having 100 concurrent workers
may be the cause of the OOMs as we try to read too many profiles at once.
@Zylphrex Zylphrex requested a review from a team as a code owner November 29, 2024 20:58
@viglia
Copy link
Contributor

viglia commented Dec 2, 2024

The new flamegraph endpoint is very likely the suspect of the OOMs we've been seeing. Upon examining the old flamegraph endpoint, we used to spawn ~25 go routines per request but at a lower throughput. So having 100 concurrent workers may be the cause of the OOMs as we try to read too many profiles at once.

a few clarifications:

  • previously we used to spawn a set for individuals goroutines for each requests, meaning that each request would be guaranteed those routines to download the profiles, with the new pool approach that's not the case. We have a shared pool and different requests compete to access the routines and download chunks
  • we were not limited to ~25 goroutines per request. We had a dynamic allocation based on number of profiles. Roughly 5 goroutines every 100 profiles. Given that we were limiting the max number of profiles to use in aggregation to 600 we happened to be around that number but it could have been less or more depending on the number of profiles
  • such limit (600) was later resized to 300 in this PR. Quite frankly I don't remember why it was lowered as there is no comment/description on the PR.

With that said, let's try to lower the pool to 25 and see how it behaves. Hard to tell in advance, but keep in mind that this is shared among different requests.

@Zylphrex
Copy link
Member Author

Zylphrex commented Dec 2, 2024

The other difference is that the old flamegraph endpoint was much less frequently used compared to this one so we need to restrict it better.

@Zylphrex Zylphrex merged commit bca3dba into main Dec 2, 2024
12 checks passed
@Zylphrex Zylphrex deleted the txiao/chore/lower-number-of-concurrent-reads branch December 2, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants