-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
busboy seems to read ahead too fast for many small (< 16KiB) files #346
Comments
This is due to how node.js streams work. They buffer up to a |
That makes sense, but why would more than 16KiB worth of files be emitted before a more previous file's stream has ended? |
If you're suggesting that busboy stops parsing immediately (and stores the current chunk somewhere temporarily) when it sees the end of a part until the file stream has been completely read, that would not only complicate things (especially because writable streams do not have an My suggestion would be to override one or both of the limits I previously mentioned. |
That almost sounds like what I was thinking, but the description of when to pause sounds different and I don't understand why that implies a chunk needs to be stored somewhere temporarily. Although highWatermark defaults to 16KiB, I get a lot more than 16KiB worth of files after a file who's stream is still unread. I imagine that busboy would emit all the Instead I get file events for all the files in my test case (8000 files) when I've only read the streams for the first < 100 files. With this result I assume busboy would read ahead indefinitely and consume all server memory if the request had enough really small files (as it is, my test case consumes 8MB of server memory immediately). |
I think if you really want to minimize this sort of thing, the best you're going to be able to do is just have something that stops writing to the busboy instance until you've sufficiently determined that it's ok to continue parsing. This could be accomplished by creating a custom Transform stream or similar that sits in between your upstream and the busboy instance. It would check how many file streams have not been completely read and then either continue passing data on or stop until the file streams have all been read or whatever logic you want. |
short version
When uploading many files under 16KiB, busboy will read ahead as fast as possible, emitting to all the
.on('file', ...)
listeners, even if the streams for previous files haven't been read yet.long version
For large files I have a mechanism in place to apply backpressure through a consistent sleep time after each chunk, and the sleep time is based on data about uploads of "parts" of the file to the destination (ex: S3). The plan is to support multiple destinations, so that's why this rate is per-file (so each file's stream as potentially a different upload rate).
For very small files, this mechanism is effectively useless, but that's what I expect. However, I also expected that after busboy gives me some number of small files (let's say 64 of 1KiB files, because 64KiB was the mode chunk size I measured while uploading large files), and I haven't read the streams for any of these files yet (the streams which would only have one 1KiB chunk if read and then end immediately), that busboy would pause the stream to apply backpressure. This does not happen.
However, for files in the range
[~16KiB, 64KiB]
backpressure appears to work, despite that I'm only reading one chunk from each file. I'm not sure what makes the difference between[1B, ~16KiB)
and[~16KiB, 64KiB]
, so I suspect that 16KiB might be a significant value in busboy somewhere.The text was updated successfully, but these errors were encountered: