-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded memory growth when using obtain and replacing an item in the cache #802
Comments
Hi, @sprutton1 . Thanks for reporting. I've checked your |
BTW, have you setup the admission picker for the disk cache? It would be helpful to provide the foyer configuration. 🙏 |
Apologies for the delay. All of our configuration happens in the same file. We do the work here. The defaults are set here. Let me know if this gives you any insight. To be more clear, it seems like the memory is growing continuously, not necessarily that we're bursting into an OOM situation. Here's an example screenshot showing growth over a few days. I suppose we could introduce locking around the get calls to block when we get a serialized value so we only do that work a single time. |
Hi, @sprutton1 . I found the admission rate limit is set to 1 GiB/s here: May I ask if the argument is expected? It is a little larger for disks without PCIe 4.0 or nvme support. For debugging, if you are using jemalloc in your project, you can use jeprof to generate a heap flamegraph. Related issues and PRs: #747 #748 (Not much information with the links, sorry about that) And, is there any way to reproduce it locally? I can help debug. 🙌 |
One more thing, would you like to integrate the foyer metrics in your env? That would help debug. UPDATES: I sent a PR to upgrade the foyer version, with which you can use the new metrics framework. FYI systeminit/si#5062 |
I landed on this number tinkering on my dev machine, which likely has faster disks than the machines we are running in production. I didn't put a lot of thought into it, to be honest. I can tune that setting to to match more closely with that environment, where I believe are using AWS EBS gp2 volumes (250mb/s max) for the cache disks. What's the risk of tuning this too low? I assume items just won't get written to the disk portion of the cache if the rate is higher than the admission limit setting?
I was using heaptrack to the same end, but maybe we can get more detail using jemalloc/jeprof. I'll try that out today.
Thanks for the offer! The project readme has local setup instructions. The most reliable way I found to recreate the issue was to create a consistent amount of traffic on the site over the course of 30-60 minutes and look at flamegraphs to see where we were leaking. Our api tests can be adapted to this purpose. |
In this case, the admission rate limiter should be set at a value close but below 250MB/s, e.g. 240 MB/s. BTW. May I ask why you are using gp2? In my experiences, gp3 is always better than gp2 in both performance and pricing. |
Just a legacy decision we haven't rectified yet. It's something I can probably fixup here. Once the other PR merges, I'll see what kind of telemetry we can pull out and come back with some numbers. |
I have some nice graphs set up and the rate limit set to ~240 mb/s. We're continuing to investigate on our side, but I will keep you posted if this helps alleviate our issues. |
We're attempting to use this projects as a replacement for our homegrown memory/disk cache built around Moka and Cacache. We're seeing an issue with memory growing unbound over time, eventually leading to the service going OOM. We've added measures to ensure we leave a percentage of the host OS memory always reserved. As far as I can tell, Foyer always reports that the memory used by the cache is within the limits.
Our current suspicion is around how we are using the
obtain
method here. Heaptrack implies that there is a memory leak in this function call.We have have complicated types that we cache that get serialized and gossiped around across services. To avoid repeated deserialization costs, when something is retrieved from the cache that is still serialized, we deserialize and insert the new value behind the same key before returning. It should be noted that the deserialized value will always be wrapped in an
Arc
.So, my questions are:
Arcs
in Foyer or do you think relying on the pointers you already manage is sufficient?CC @fnichol
The text was updated successfully, but these errors were encountered: