-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String vs Buffer memory usage + performance #4506
Comments
As is so often the case, 10 mins after I ask the question I figure it out :( (at least partially). The problem is the test setup - it looks like If I change |
@nodejs/performance PTAL |
Take a read at the nodejs/performance#173 to understand more about the cost of creating a new buffer. You can use the strategy implemented at mongodb/js-bson#703 to help reduce the memory usage if you really need to use buffer, to summarize, create a new buffer and store the offset, and instead of creating a new buffer everytime, you just take slices of that main buffer until you fill everything, then you discard the old buffer and create a new one. This is more efficient when the length of the text you are storing is well-known but works for different lengths too. |
@H4ad's answer is really excellent, and I want to stress this part of it...
It is a fundamental performance pattern. |
Thanks for the answers - I think things have gotten a bit muddled (or possibly I'm not understanding something) - my original question was why buffer performance was substantially better than string performance (in both setup time and memory) - the increased time of converting the stored buffer to a string on each access was expected. (Outside of this example I would of course not be converting buffers to strings on every map access!) I discovered the answer to my original question shortly after posting (as I noted in my followup) - the issue was that memory associated with building the string was not freed - when the function ended, and was retained with the string when it was stored in the map - as opposed to the buffer case where converting the string to a buffer released all the memory used to build the string. This memory behaviour (and frankly the execution time) still seems strange to me - but at least has an explanation (even if odd) and means that in most real world situations the memory increase would not be observed. I suspect if I dig into the string implementation I'd see that each string concatenation retains a reference to both strings as opposed to allocating, copying and releasing the originals. |
You can flatten the string. The following usage pattern might work: |
My bad, I totally misread your question and your data, the buffer consumes less memory because node.js implements by default the strategy I explained earlier, you can read the implementation at: https://github.com/nodejs/node/blob/c4aa34aa4dc5accb100be07b4aceded83eeb3956/lib/buffer.js#L447-L472
Edit: You can understand more the behavior of strings on v8 by reading https://iliazeus.lol/articles/js-string-optimizations-en/ |
Node.js Version
v18-v22
NPM Version
v10.8.2
Operating System
Linux zacknewsham-xps 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
buffer, string_decoder, v8
Description
I'm trying to understand why my application has high baseline memory usage - in doing this I discovered something I can't explain - strings seem to cost >10x more memory per character than the equivalent buffer. Some amount of this is expected (~2x given UTF-16 nature of JS strings) - but not on this scale.
A secondary question is why the setup time of a String->String map is so much slower (8x) than a map that takes that string and converts it to a buffer before storing.
Below is a minimal preproduction - the commented out lines in
test
allow you to toggle between the string->string map and the string->buffer mapI run it with
--expose-gc
just to get a valid heap snapshot at the end. The total string size stored is (17 + 1000) * 100,000 - so the absolute minimal memory usage of this would be around 100mb (a trivial C++ implementation of the same takes 114mb).When running with the string->string map, the memory cost is around 3.2GB and the setup time (to populate the map) is ~11s, when running as a string->buffer map the memory cost is 280MB and the setup time is ~1.3s. The "time" difference reported is completely explicable (the cost of parsing the buffer each time)
Minimal Reproduction
Output
Before You Submit
The text was updated successfully, but these errors were encountered: