Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InflateState::new_boxed() uses large stack allocation #155

Open
weirddan455 opened this issue Aug 21, 2024 · 2 comments
Open

InflateState::new_boxed() uses large stack allocation #155

weirddan455 opened this issue Aug 21, 2024 · 2 comments

Comments

@weirddan455
Copy link

weirddan455 commented Aug 21, 2024

I noticed a comment in the InflateState::new() function:

Note that this struct is quite large due to internal buffers, and as such storing it on the stack is not recommended.

The size of the struct is about 43KB. Probably not a huge issue for Windows or Linux desktops where you have 1MB+ of stack space but maybe an issue for embedded platforms.

The solution seems to be to use the new_boxed() function. However, this results in a temporary stack allocation of the entire size of the InflateState struct regardless. The function calls Box::default() to do the heap allocation which has a definition of

fn default() -> Self {
    Box::new(T::default())
}

This first constructs InflateState on the stack and then moves it to the heap. I was able to confirm this by using rust-gdb to examine the assembly code. I will note that a release build on x86_64 Linux appears to optimize this to first allocate the memory and then initialize directly on the heap but a debug build does not. Other platforms also may not have this optimization.

The only solutions I can think of off the top of my head is unsafe code doing manual allocation or moving some of the larger fixed size arrays to Vec. Neither is really ideal since it seems to be a goal of this project to not use unsafe and Vec would result in some additional overhead. Still figured I would raise this issue in case someone has a better idea.

@oyvindln
Copy link
Collaborator

Yeah as far as I know rust still lacks a way to directly heap allocate a struct that is guaranteed to avoid the stack even in debug mode without resorting to unsafe (but feel free to correct me if I'm wrong.) I think it may be possible to allocate the array buffers separately and keep the static length information now that const generics are in the language though I don't know if that's optimal performance wise.

The reason for using default in that function was that in the past it used to used the former box keyword internally which did a direct heap allocation but that was removed a long time ago. Back then the compiler also struggled to optimize out allocation in release mode so using Box::default() was a workaround to avoid that but it's a bit of a historical artifact that no longer makes sense that should probably just be changed to Box::new in the code. (Will have to check what rust version it was changed in but it might have been years now.)

If the compiler can optimize out the stack allocation on one platform it very likely can on other platforms too though, as LLVM does much of the optimization on LLVM intermediate representation that will be mostly the same between platforms (at least with portable code like this that doesn't interface with any system stuff), before it's finally converted down to machine code.

It should also be doable to reduce the size of the compressor and decompressor structs a tiny bit as right the huffman tables in the structs used for the distances and huffman length codes are way larger than they need to be for code simplicity, something inherited from the original miniz.

@terrorfisch
Copy link

One could add bytemuck as a (feature gated?) dependency and derive the Zeroable trait to use zeroed_box. Is there a policy to not use external dependencies at all if they use unsafe or is it decided on a case by case basis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants