-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "smart" upload function #90
Comments
I'm definitely in favor of this change. I personally use |
I had to take a hiatus from this development but will be back working on it this month. I am considering introducing another dependency maintained by me as it will simplify much of the logic: it is a function that accepts an iterator or async iterator and returns a promise for an array collecting the results of invoking a mapper function over each element produced by the iterator, in the order the iterator produced the elements. The amount of concurrency can be specified in the same way as Bluebird's This would be used in the piecewise upload routine. Each upload source (buffer, file, stream) would have an iterator that produces pieces to upload. |
Sounds fine if it's something that would be useful on it's own as a module or is large enough to warrant being separate. But if it's just a small helper function it'd be better to throw it in this module. |
I decided not to bring in this other dependency. I might later, but what I have today works (I've tested it against large files/buffers). I'm also not sure what environments are supported by this module (backblaze-b2) and whether they all support ECMAScript's asynchronous iteration feature ( I did bring in one more dependency (p-retry) to handle exponential back-off following a temporary upload failure. Sometime this week I hope to publish this function as a module on npm, since I will be using it in multiple projects and this will simplify distribution. My goal is to get the functionality merged into backblaze-b2 itself. Once I publish this new module, I'd like to request a code review of it. After working through any concerns there, I can submit a PR to add the function to backblaze-b2. |
The module is published. Feel free to leave code review comments here, or as issues on the module's repository. https://www.npmjs.com/package/@gideo-llc/backblaze-b2-upload-any |
We have been using the module in production for the last few days and have migrated several TB of data successfully, streaming it from an HTTP origin to B2. I have not found any instances of corruption yet. There has been a pretty even mixture of standard and large-file uploads. |
@cdhowie Sorry for the delay, the API looks good but hopefully I can take a look at the code over the weekend. |
@odensc There's a few things that I may want to rewrite but I haven't had time, and what's there works pretty well. The main large-file-upload routine is a bit complex. Replacing it with async iterables ( |
I just added |
The AWS S3 SDK for JavaScript has an
upload
function that does not correspond to any particular API request. You can give it a buffer or a stream, and it will automatically perform either a single PutObject call or a multi-part upload.It would be a great benefit to this library to provide something similar. Right now, large file uploads are unnecessarily cumbersome, especially when the input is a stream. Authorization token management is a giant pain.
I am working on such a function right now for our own internal use. I'm writing it as a module that exposes a single function that can be attached to the prototype of the
B2
class provided by this library (B2.prototype.uploadAny = require('backblaze-b2-upload-any');
).This issue is intended to convey my intent to integrate this function into this library and submit a PR. Therefore, I would very much appreciate any feedback on my proposal so that I can accommodate any necessary design changes as early as possible.
The current planned features of this function (many of which are already done) are:
There is a difference between the local file and stream cases. When uploading a local file, no content is buffered in memory. Rather, multiple read streams are created (and re-created as necessary if a part upload must be retried).
Stream support necessarily requires some buffering in memory to facilitate retries since node streams cannot be seeked (and not all stream types would be seekable, anyway).
Note that I currently introduce two new dependencies:
@hapi/joi
which is used to validate the options object.memoizee
which is used during re-authorization. If multiple part uploads are trying to re-authorize at the same time, this prevents multiple authorize calls to B2.The text was updated successfully, but these errors were encountered: