-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory limits for computers #1580
Comments
Marking as "feedback wanted", as thoughts/ideas/glaring-holes-in-the-plan are appreciated! |
It's definitely not as quick as I'd like, which probably is a good hint to how little I understand low-level performance :D. A newly booted computer uses about 700KB1 of memory, which we can scan in 1-2ms. However, if we create a lot of objects (testing with This means we definitely we don't want to compute retained memory after every computer resume, but only after every N bytes allocated.
VisualVM is not especially useful here, as most things seem to get inlined into the Footnotes
|
Have implemented the initial allocation tracking in 76968f2. This should at least give us some basic metrics on average memory growth. |
thanks! i myself am concerned about malicious users when i as the server owner accidentally crashed my own server messing around with web sockets. used up all of Java Heap and dropped ticks to 1 tick over 60 seconds |
We've been running the above commit on SwitchCraft for a few weeks now. Here is a plot of the median, 90% quantile, 99% quantile and max allocation rate over the last 3 hours: As we can see, the majority of computers don't allocate very much at all (the median hovers around 250KiB/s). 90% are sitting at less than 2MiB/s, but the remaining 10% can go all the way up to 100+MiB/s. Here's also a histogram breakdown, which gives a better idea what the long tail looks like. This is definitely a larger long-tail than I expected. I was assuming things would cap out at around 8MB/s or so, except in rare cases (reading files). Combined with the performance issues of memory tracing1, makes me less sure that the method outlined in the OP is viable Maybe we do need to go the sampling approach after all :/. It should be fairly easy to prototype at least. Footnotes
|
One of ComputerCraft's longest standing issues is that computers can use an unbounded amount of memory. This means that malicious users can allocate large amounts of memory, slowing down or outright crashing servers, in a way that is hard to detect.
I was originally hoping to solve this as part of #769. However, several years later, I think it's looking unlikely that issue is going anywhere - I've done a lot of research and experimental work, and not satisfied with any of the options.
There is some discussion in that issue on how we might impose memory limits in Cobalt, and after some thought, I think it offers a way forward. My initial plan was to track a small number of allocations in the Cobalt VM, and use that to approximate the computer's memory usage (cc-tweaked/Cobalt#66). However, as that issue details, I don't think this approach is as robust as I originally hoped.
Instead, my plan is to go with something closer to what Graal does: periodically compute retained memory, and error if it's over the limit.
There's quite a lot of risk here, and I've definitely not got a concrete plan here1, so instead I want to do this quite incrementally, in a way that each step adds some value.
Measure allocations of computers: Use
ThreadMXBean.getThreadAllocatedBytes
to track how much memory a computer allocates2, and expose this as a per-computer metric.While this probably isn't very useful for finding actively malicious computers, I think this is probably useful for passive monitoring of computers (i.e. cc-prometheus). Mostly, I think it's useful to gather some data for what the allocation pattern of "normal" computers looks like.
Measure retained memory of computers: Add a system to compute the reachable/retained memory of a$N$ bytes allocated), and expose this as a metric.
LuaState
, compute that after each task (or after everyMy biggest concern here is performance, and whether we can do a scan of the whole VM in a reasonable time. I'm not aware of any sensible GC-like strategies we can apply here (i.e. incremental or three-colour GCs3), as most of those will work against Java's existing GC. Computers tend not to use much memory, so I'm hoping this should be pretty fast, but I honestly don't know!
I'm not 100% sure how this should interface with CC yet. We need to include the main-thread queue and event queue in this retained memory, but ideally we'd also track all Lua-related objects (especially HTTP handles).
Managed allocations in Cobalt: All (non-trivial) allocations in Cobalt should now be done through some
Allocator
class, which (for now) just counts how many bytes are allocated. This effectively gives us the same as 1., but will allow us to hook into it in the future.We might want to expose this as a separate metric, so we can see how this compares to the
allocation rate given by JMX.
I think Lua -> Java calls also need to through this interface. The
LuaString
toString
conversion allocates quite a bit, and we want a scheme which avoids Computers can cause Minecraft to use all available memory when converting objects from Lua to Java MightyPirates/OpenComputers#1774 without having to do string interning.Hard limits on large allocations: If some code attempts to allocate more than$N$ bytes (probably where $N$ is half the computer's soft limit), throw an "out of memory" error immediately.
Interrupt the VM after lots of allocations: Using the monitoring in 1. and 3., interrupt the VM after allocating$N$ bytes. In this case, we suspend the VM, and compute retained memory.
There are then two cases here:
Like with timeouts, we probably want some mechanism to force-terminate the computer if it continues to run after a pause has been requested.
I'm not thrilled about resuming the computer with an error - it does mean the error will occur after the allocation, so source positions won't be entirely accurate.
Like I said, definitely not a concrete plan, but hopefully helps us moving towards something working!
Footnotes
Though if you've been paying attention to the news in the UK, you'll know concrete isn't all it's cracked up to be. ↩
Computed both after each task, and in the computer thread monitor. ↩
We can use a two-colour marking system for most objects (aside from strings, as those are shared across VMs), so at least we don't need to generate massive
HashSets
. ↩The text was updated successfully, but these errors were encountered: