Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded memory growth when using obtain and replacing an item in the cache #802

Open
sprutton1 opened this issue Nov 27, 2024 · 9 comments
Labels
Q & A Question and Answer

Comments

@sprutton1
Copy link

We're attempting to use this projects as a replacement for our homegrown memory/disk cache built around Moka and Cacache. We're seeing an issue with memory growing unbound over time, eventually leading to the service going OOM. We've added measures to ensure we leave a percentage of the host OS memory always reserved. As far as I can tell, Foyer always reports that the memory used by the cache is within the limits.

Our current suspicion is around how we are using the obtain method here. Heaptrack implies that there is a memory leak in this function call.

We have have complicated types that we cache that get serialized and gossiped around across services. To avoid repeated deserialization costs, when something is retrieved from the cache that is still serialized, we deserialize and insert the new value behind the same key before returning. It should be noted that the deserialized value will always be wrapped in an Arc.

So, my questions are:

  1. Could our re-insertion technique be causing issues with how obtain does its deduplicaiton, possibly causing a leak?
  2. Is it appropriate to store Arcs in Foyer or do you think relying on the pointers you already manage is sufficient?

CC @fnichol

@MrCroxx
Copy link
Collaborator

MrCroxx commented Nov 27, 2024

Hi, @sprutton1 . Thanks for reporting.

I've checked your obtain() method usage. IIUC, if there are multiple concurrent obtain() calls, and a serialized value is returned, all of the callers will deserialize the value and reinsert the deserialized value into the cache. The memory usage of the concurrent deserialization would cause OOM. Besides, each reinsertion will lead to a disk cache write, which would consume more memory than expected. (Currently, foyer writes the disk cache on insertion, not memory cache eviction.)

@MrCroxx MrCroxx assigned MrCroxx and unassigned MrCroxx Nov 27, 2024
@MrCroxx MrCroxx added the Q & A Question and Answer label Nov 27, 2024
@MrCroxx
Copy link
Collaborator

MrCroxx commented Nov 27, 2024

BTW, have you setup the admission picker for the disk cache? It would be helpful to provide the foyer configuration. 🙏

@sprutton1
Copy link
Author

Apologies for the delay.

All of our configuration happens in the same file. We do the work here. The defaults are set here. Let me know if this gives you any insight.

To be more clear, it seems like the memory is growing continuously, not necessarily that we're bursting into an OOM situation. Here's an example screenshot showing growth over a few days.

image

I suppose we could introduce locking around the get calls to block when we get a serialized value so we only do that work a single time.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Dec 4, 2024

Hi, @sprutton1 . I found the admission rate limit is set to 1 GiB/s here:

https://github.com/systeminit/si/blob/1963e27a26adeb4f15877dda15458f92a6ab8e1e/lib/si-layer-cache/src/hybrid_cache.rs#L22

May I ask if the argument is expected? It is a little larger for disks without PCIe 4.0 or nvme support.

For debugging, if you are using jemalloc in your project, you can use jeprof to generate a heap flamegraph. Related issues and PRs: #747 #748 (Not much information with the links, sorry about that)

And, is there any way to reproduce it locally? I can help debug. 🙌

@MrCroxx
Copy link
Collaborator

MrCroxx commented Dec 4, 2024

One more thing, would you like to integrate the foyer metrics in your env? That would help debug.

UPDATES: I sent a PR to upgrade the foyer version, with which you can use the new metrics framework. FYI systeminit/si#5062

@sprutton1
Copy link
Author

May I ask if the argument is expected? It is a little larger for disks without PCIe 4.0 or nvme support.

I landed on this number tinkering on my dev machine, which likely has faster disks than the machines we are running in production. I didn't put a lot of thought into it, to be honest. I can tune that setting to to match more closely with that environment, where I believe are using AWS EBS gp2 volumes (250mb/s max) for the cache disks.

What's the risk of tuning this too low? I assume items just won't get written to the disk portion of the cache if the rate is higher than the admission limit setting?

For debugging, if you are using jemalloc in your project, you can use jeprof to generate a heap flamegraph.

I was using heaptrack to the same end, but maybe we can get more detail using jemalloc/jeprof. I'll try that out today.

And, is there any way to reproduce it locally? I can help debug. 🙌

Thanks for the offer! The project readme has local setup instructions. The most reliable way I found to recreate the issue was to create a consistent amount of traffic on the site over the course of 30-60 minutes and look at flamegraphs to see where we were leaking. Our api tests can be adapted to this purpose.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Dec 5, 2024

where I believe are using AWS EBS gp2 volumes (250mb/s max) for the cache disks.

In this case, the admission rate limiter should be set at a value close but below 250MB/s, e.g. 240 MB/s.

BTW. May I ask why you are using gp2? In my experiences, gp3 is always better than gp2 in both performance and pricing.

@sprutton1
Copy link
Author

May I ask why you are using gp2? In my experiences, gp3 is always better than gp2 in both performance and pricing.

Just a legacy decision we haven't rectified yet. It's something I can probably fixup here. Once the other PR merges, I'll see what kind of telemetry we can pull out and come back with some numbers.

@sprutton1
Copy link
Author

I have some nice graphs set up and the rate limit set to ~240 mb/s. We're continuing to investigate on our side, but I will keep you posted if this helps alleviate our issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Q & A Question and Answer
Projects
None yet
Development

No branches or pull requests

2 participants