Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS SDK/S3 RangeError: Out of memory failure randomly if streamed chunks are very large (> 500KB) #7820

Open
asilvas opened this issue Dec 25, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@asilvas
Copy link
Contributor

asilvas commented Dec 25, 2023

What version of Bun is running?

1.0.20+09d51486e

What platform is your computer?

Linux 6.2.0-39-generic x86_64 x86_64

What steps can reproduce the bug?

I've spent a lot of time trying to create a simple repro example but no luck yet. But on a gigabit network I'm able to repro this failure maybe 20% of the time downloading a 59MB file from S3 using the latest @aws-sdk/client-s3 S3 client. It seems to only happen on high bandwidth interfaces, like in AWS.

Edit Not the best repro example, but was able to narrow it down a bit. It relies on a hacky custom fetch handler because of another bug in Bun, but here it is:

import { S3 } from '@aws-sdk/client-s3';
// https://github.com/aws/aws-sdk-js-v3/issues/4619
import { NodeFetchHttpHandler } from './nodeFetchHttpHandler';

const s3 = new S3({
  region: 'us-east-1',
  requestHandler: new NodeFetchHttpHandler({ requestTimeout: 15_000, }),
  maxAttempts: 1,
});

const { Body } = await s3.getObject({
  Bucket: 'BUCKET_NAME',
  Key: 'S3_KEY',
});

const byteArr = await Body.transformToByteArray();

What is the expected behavior?

Doesn't crash.

What do you see instead?

RangeError: Out of memory
      at anonymous (native:1:1)
      at readableByteStreamControllerPull (:1:11)
      at readMany (:1:11)
      at processTicksAndRejections (:61:39)

Additional information

Most of the time chunks come in consistently around 16KB. But about 20% of the time chunk sizes go through the roof of all sizes ranging up to 540KB, and 100% of the time these large chunks come in it causes this failure. To be clear I'm using a small amount of memory and in fact is running on a machine with 128GB, so this is not a system issue. This is a production issue I'm troubleshooting locally.

@asilvas asilvas added the bug Something isn't working label Dec 25, 2023
@asilvas
Copy link
Contributor Author

asilvas commented Dec 25, 2023

Related (but not the same): #7428

@monkfromearth
Copy link

monkfromearth commented Dec 27, 2023

This is crashing as well, for aws-sdk (v2) and was working fine till v1.0.15.
Interestingly, https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-lib-storage/ this works.

@asilvas
Copy link
Contributor Author

asilvas commented Dec 27, 2023

This is crashing as well, for aws-sdk (v2) and was working fine till v1.0.15. Interestingly, https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-lib-storage/ this works.

Interesting considering AWS CDK also broke in 1.0.15. I thought this was a stream bug but maybe they are connected. The other lib is only for uploads so won't fix the reported issue above.

@asilvas
Copy link
Contributor Author

asilvas commented Jan 6, 2024

Ran the test using bun-profile to provide more details (using v1.0.21):

* thread #1, name = 'bun', stop reason = signal SIGSEGV
  * frame #0: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] unwrap at RawPtrTraits.h:44:69
    frame #1: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] getMayBeNull at CagedPtr.h:71:18
    frame #2: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] getMayBeNull at CagedBarrierPtr.h:62:65
    frame #3: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] vector at JSArrayBufferView.h:284:44
    frame #4: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] typedVector at JSGenericTypedArrayViewInlines.h:776:50
    frame #5: 0x000055fe468d2012 bun-profile`::jsBufferConstructorFunction_concat() [inlined] jsBufferConstructorFunction_concatBody at JSBuffer.cpp:780:29
    frame #6: 0x000055fe468d1eb4 bun-profile`::jsBufferConstructorFunction_concat() at JSBuffer.cpp:1751:12
    frame #7: 0x00007f9598a08038
    frame #8: 0x000055fe46721b28 bun-profile`js_trampoline_op_call + 23
    frame #9: 0x000055fe467043bd bun-profile`vmEntryToJavaScript + 188
    frame #10: 0x000055fe475e50a9 bun-profile`JSC::Interpreter::executeCall(JSC::JSObject*, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) + 1961
    frame #11: 0x000055fe478cb437 bun-profile`JSC::call(JSC::JSGlobalObject*, JSC::JSValue, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) + 23
    frame #12: 0x000055fe46958589 bun-profile`::fireEventListeners() [inlined] innerInvokeEventListeners at EventEmitter.cpp:237:9
    frame #13: 0x000055fe46958418 bun-profile`::fireEventListeners() at EventEmitter.cpp:192:5
    frame #14: 0x000055fe4698e7bc bun-profile`::jsEventEmitterPrototypeFunction_emit() [inlined] emit at EventEmitter.cpp:125:5
    frame #15: 0x000055fe4698e7b4 bun-profile`::jsEventEmitterPrototypeFunction_emit() [inlined] emitForBindings at EventEmitter.cpp:119:5
    frame #16: 0x000055fe4698e7a2 bun-profile`::jsEventEmitterPrototypeFunction_emit() [inlined] jsEventEmitterPrototypeFunction_emitBody at JSEventEmitter.cpp:404:5
    frame #17: 0x000055fe4698e36e bun-profile`::jsEventEmitterPrototypeFunction_emit() [inlined] call<&WebCore::jsEventEmitterPrototypeFunction_emitBody, (WebCore::CastedThisErrorBehavior)0> at JSEventEmitterCustom.h:48:9
    frame #18: 0x000055fe4698e315 bun-profile`::jsEventEmitterPrototypeFunction_emit() at JSEventEmitter.cpp:409:12
    frame #19: 0x00007f9598a08038
    frame #20: 0x000055fe46721b28 bun-profile`js_trampoline_op_call + 23
    frame #21: 0x000055fe467228a8 bun-profile`js_trampoline_op_call_ignore_result + 23
    frame #22: 0x00007f9598b1c735
    frame #23: 0x000055fe467043bd bun-profile`vmEntryToJavaScript + 188
    frame #24: 0x000055fe475e50a9 bun-profile`JSC::Interpreter::executeCall(JSC::JSObject*, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) + 1961
    frame #25: 0x000055fe478cb2eb bun-profile`JSC::call(JSC::JSGlobalObject*, JSC::JSValue, JSC::ArgList const&, WTF::ASCIILiteral) + 235
    frame #26: 0x000055fe46992ce3 bun-profile`::drain() at JSNextTickQueue.cpp:93:9
    frame #27: 0x000055fe45b99c52 bun-profile`src.bun.js.event_loop.EventLoop.tickQueueWithCount__anon_353679 [inlined] src.bun.js.event_loop.EventLoop.drainMicrotasksWithGlobal(this=0x000002c2a00f0180, globalObject=0x00007f95974c4068) at event_loop.zig:650:45
    frame #28: 0x000055fe45b99c4a bun-profile`src.bun.js.event_loop.EventLoop.tickQueueWithCount__anon_353679(this=0x000002c2a00f0180) at event_loop.zig:947:43
    frame #29: 0x000055fe457d3407 bun-profile`src.bun.js.event_loop.EventLoop.tick [inlined] src.bun.js.event_loop.EventLoop.tickWithCount(this=0x000002c2a00f0180) at event_loop.zig:955:39
    frame #30: 0x000055fe457d33ff bun-profile`src.bun.js.event_loop.EventLoop.tick(this=0x000002c2a00f0180) at event_loop.zig:1171:38
    frame #31: 0x000055fe45c1c491 bun-profile`src.bun.js.event_loop.EventLoop.waitForPromise(this=0x000002c2a00f0180, promise=src.bun.js.bindings.bindings.AnyPromise @ 0x00007fff88e75a00) at event_loop.zig:1193:30
    frame #32: 0x000055fe4573e673 bun-profile`src.bun.js.javascript.OpaqueWrap__anon_46994__struct_262828.callback [inlined] src.bun.js.javascript.VirtualMachine.waitForPromise(this=<unavailable>, promise=src.bun.js.bindings.bindings.AnyPromise @ 0x00007fff88e75a00) at javascript.zig:1085:40
    frame #33: 0x000055fe4573e66e bun-profile`src.bun.js.javascript.OpaqueWrap__anon_46994__struct_262828.callback at javascript.zig:2300:32
    frame #34: 0x000055fe4573e602 bun-profile`src.bun.js.javascript.OpaqueWrap__anon_46994__struct_262828.callback at bun_js.zig:284:30
    frame #35: 0x000055fe4573d8f1 bun-profile`src.bun.js.javascript.OpaqueWrap__anon_46994__struct_262828.callback(ctx=0x000055fe487771d8) at javascript.zig:104:13
    frame #36: 0x000055fe469778a4 bun-profile`JSC__VM__holdAPILock at bindings.cpp:4544:5
    frame #37: 0x000055fe457035d3 bun-profile`src.bun_js.Run.boot at shimmer.zig:186:41
    frame #38: 0x000055fe457035bd bun-profile`src.bun_js.Run.boot [inlined] src.bun.js.bindings.bindings.VM.holdAPILock(this=<unavailable>, ctx=<unavailable>, callback=<unavailable>) at bindings.zig:5219:14
    frame #39: 0x000055fe457035bd bun-profile`src.bun_js.Run.boot(ctx_=<unavailable>, entry_path=<unavailable>) at bun_js.zig:255:35
    frame #40: 0x000055fe4571ba91 bun-profile`src.cli.Command.maybeOpenWithBunJS(ctx=0x00007fff88ea4258) at cli.zig:1813:23
    frame #41: 0x000055fe454b2b5b bun-profile`src.cli.Cli.start__anon_5020 at cli.zig:1710:47
    frame #42: 0x000055fe454b1ad5 bun-profile`src.cli.Cli.start__anon_5020(allocator=<unavailable>, (null)=<unavailable>, (null)=<unavailable>) at cli.zig:57:22
    frame #43: 0x000055fe454ade3d bun-profile`main at main.zig:49:22
    frame #44: 0x000055fe454adbf0 bun-profile`main [inlined] start.callMain at start.zig:575:22
    frame #45: 0x000055fe454adbf0 bun-profile`main [inlined] start.initEventLoopAndCallMain at start.zig:519:5
    frame #46: 0x000055fe454adbf0 bun-profile`main at start.zig:469:36
    frame #47: 0x000055fe454adbac bun-profile`main(c_argc=<unavailable>, c_argv=<unavailable>, c_envp=<unavailable>) at start.zig:484:101
    frame #48: 0x00007f95e3223a90 libc.so.6`__libc_start_call_main(main=(bun-profile`main at start.zig:472), argc=2, argv=0x00007fff88ee0558) at libc_start_call_main.h:58:16
    frame #49: 0x00007f95e3223b49 libc.so.6`__libc_start_main_impl(main=(bun-profile`main at start.zig:472), argc=2, argv=0x00007fff88ee0558, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff88ee0548) at libc-start.c:360:3
    frame #50: 0x000055fe454ada2a bun-profile`_start + 42

@asilvas
Copy link
Contributor Author

asilvas commented Jan 7, 2024

Using a locally built bun-profile. In this example RSS was only 110MB prior to allocation, and sum of all chunks was only ~15MB.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  unwrap () at bun-webkit/include/wtf/RawPtrTraits.h:44
44          static ALWAYS_INLINE T* unwrap(const StorageType& ptr) { return ptr; }
[Current thread is 1 (Thread 0x7f35ef1fe340 (LWP 180577))]
(gdb) bt
#0  unwrap () at bun-webkit/include/wtf/RawPtrTraits.h:44
#1  getMayBeNull () at bun-webkit/include/wtf/CagedPtr.h:71
#2  getMayBeNull () at bun-webkit/include/JavaScriptCore/CagedBarrierPtr.h:62
#3  vector () at bun-webkit/include/JavaScriptCore/JSArrayBufferView.h:284
#4  typedVector () at bun-webkit/include/JavaScriptCore/JSGenericTypedArrayViewInlines.h:776
#5  jsBufferConstructorFunction_concatBody () at /home/asilvas/bun/src/bun.js/bindings/JSBuffer.cpp:784
#6  jsBufferConstructorFunction_concat () at /home/asilvas/bun/src/bun.js/bindings/JSBuffer.cpp:1757
#7  0x00007f35a3e08038 in ?? ()
#8  0x00007ffd7a4fdb60 in ?? ()
#9  0x0000564154318828 in js_trampoline_op_call ()
#10 0x0000000000000000 in ?? ()

@asilvas
Copy link
Contributor Author

asilvas commented Jan 8, 2024

Verified #8039 PR does not resolve, but it does appear segfault's are no longer possible, and instead result in RangeError: Out of memory for every failure. In the latest failure I can see 526 chunks with the largest chunk being 173KB, with a combined size of 17MB (for all chunks), threw this error. The only consistency with this bug is that most of the time chunks are <= 16KB it never fails, but once chunks start exceeding 16KB is when we see these random failures. Even though in isolated tests with Buffer.concat it's MUCH harder to repro. For example I can easily concat gigabytes of buffers w/o issue. I made considerable effort over the weekend to try and submit a PR but I don't yet understand what is happening. Almost as if there is some sort of streaming-related race condition.

@ShadowLeiPolestar
Copy link

ShadowLeiPolestar commented Apr 19, 2024

Met the same issue "RangeError: Out of memory\n at anonymous (native)\n at readableByteStreamControllerPull (:1:11)\n at readMany (:1:11)\n " today, when I'm trying to upload a large file to AWS S3 using Bun.serve(... fetch (req, server) ... ) directly, using binary upload instead of multipart file.

I must say this is real wired:

  1. Test from my local (MacOS, M1), never had any issues for up to 500Mb single file.
  2. Push to server (docker: --platform=linux/amd64 oven/bun:1.1.4-alpine) this issue happened when uploading a 5-10M+ file.
  3. Previous version (docker: --platform=linux/amd64 oven/bun:1.1.0-alpine w/ a previous code style) could upload a 500M+ file "sometimes".

My code example:

Bun.serve({
    async fetch(req, server) {
       //1. init: S3 (AWS - s3-client)
            await CreateMultipartUploadCommand(...)
       //2. upload: (using binary file)
               for await (const chunk of req.body) {
                     promises.push(  S3 - async UploadPartCommand(...) )
               }
       //3. end upload S3:
             await Promise.all( [ step-2 - promises])
             await CompleteMultipartUploadCommand(...)
    },
    port: 8000,
    maxRequestBodySize: Number.MAX_SAFE_INTEGER, //9007199254740991 (same issue with 999916545+10000)
});

Note for performance reason I used promises in step-2) & await them to complete in step-3)

Finally result in the OOM issue, however my cloud (AWS) show the memory isn't high, totally no more than 200M used (totally 2G)


Update: Another memory leak but not sure any related: Bun.serve: Memory leak HTTP POST with Body

@zackify
Copy link

zackify commented Jun 19, 2024

Ahh yes I am running into this is well. Trying to make reproduction repo but its tough. Will try downgrading bun @ShadowLeiPolestar since you say 1.1.0 worked sometimes.

Also seeing 800mb ram free on my server, and never see this locally on macos.

Edit: can confirm, downgrading works.

I am sending large json bodies to bun.serve and after the downgrade no issue at least with that one, however even with the downgrade I see memory usage continue increasing on that version :( may have to switch to node for the deadline i have to hit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants