Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extremely inefficient usage of memcpy host function #102

Open
spoonincode opened this issue Jan 29, 2023 · 3 comments
Open

extremely inefficient usage of memcpy host function #102

spoonincode opened this issue Jan 29, 2023 · 3 comments

Comments

@spoonincode
Copy link
Member

cdt generates extremely inefficient calls to the memcpy host function. It will create wasm such as

    local.get $l5
    local.get $l3
    i32.const 8
    call $env.memcpy

In this case the compiler wants to copy a fixed 8 bytes, but generates a call to the memcpy host function to perform that copy. Compared to a simple WASM load & store, this bloats what should effectively be (at most) 2 machine instructions (with EOS VM OC anyways) to probably at least a hundred machine instructions including a bunch of branches.

A look at the eosio.token contract shows that 37 of the 39 memcpy host function calls in eosio.token is similar to the above: a fixed 8 byte memcpy host function call. For a more complex contract, say the EOS EVM contract, 526 of the 585 memcpy calls are for a fixed byte size. Though, for EOS EVM, there is considerably more variety in the fixed byte size.

breakdown of EOS EVM constant size `memcpy` host function calls
bytes occurrences
1 4
2 27
4 28
8 241
16 1
20 46
32 99
64 1
77 1
79 6
80 22
88 6
96 2
176 2
192 39
504 1

The current performance impact of this inefficient code generation is likely over 5% on a chain like WAX with EOS VM OC enabled. That is, on WAX, with EOS VM OC enabled, over 5% of the main thread time today is dealing with memcpy host function calls (many of which I suspect are these fixed size variants due to how invasive they are, but I don't have proof of that).

cdt should, minimally, have an optimization pass that squashes these small fixed size memcpy and memset calls to simple wasm load & stores.

@stephenpdeos
Copy link
Member

Further breakdown required before actionable

@dimas1185
Copy link
Contributor

dimas1185 commented Aug 10, 2023

I don't see we interfere clang somewhere whether to use memcpy instead of load/store. So memcpy supposed to be used when actual memcpy is encountered in c++ code and load/store supposed to be used for stack operations. I can't confirm this 100% for now, this is clang internals. To address this we should first analyze evm code and find out, what c++ code leads to generation of those small memory allocations. Next step would be to think about solution.

@ericpassmore ericpassmore modified the milestones: CDT 5.0.0-rc1, Future Milestone Aug 11, 2023
@ericpassmore ericpassmore modified the milestones: Future Milestone, CDT 5.0.0-rc1 Aug 15, 2023
@ericpassmore
Copy link
Contributor

working on identifying the C++ code that creates these memcpy calls. Once we identify these sections of code, we can identify some options and solutions. This work will require additional investigation.

@BenjaminGormanPMP BenjaminGormanPMP moved this from Todo to In Progress in Team Backlog Aug 16, 2023
@ericpassmore ericpassmore modified the milestones: CDT 4.1.0-rc1, CDT 5.0 Sep 9, 2023
@ericpassmore ericpassmore modified the milestones: CDT 5.0, CDT 4.1.0-rc1 Sep 19, 2023
@BenjaminGormanPMP BenjaminGormanPMP moved this from In Progress to Todo in Team Backlog Oct 26, 2023
@arhag arhag removed this from the Unknown milestone Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

6 participants