Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

large writes get truncated by uv_fs_write #1215

Open
tfogal opened this issue Mar 25, 2014 · 7 comments
Open

large writes get truncated by uv_fs_write #1215

tfogal opened this issue Mar 25, 2014 · 7 comments

Comments

@tfogal
Copy link

tfogal commented Mar 25, 2014

It appears large writes (greater than 32bit) are getting truncated to values near (but not at) the largest value representable in 32bits. Specifically, for both @saghul and I, the output ends up being 2147479552 bytes for a 1000_1000_1000*4 byte write. For reference, INT_MAX on my system is 2147483647L; thus the write creates a file exactly 4095 bytes shy of INT_MAX.

The gist at:

https://gist.github.com/tfogal/9765258

adds a new test that fails for both @saghul and I.

@saghul suspects that the write is getting interrupted and libuv isn't properly restarting it.

@saghul
Copy link
Contributor

saghul commented Mar 29, 2014

I'll check if write getting interrupted is the case, and if so, we'll probably document this behavior and let it up to the user to retry with an offset.

@txdv
Copy link
Contributor

txdv commented Jul 7, 2014

What OS are you using @tfogal ?

@saghul
Copy link
Contributor

saghul commented Jul 7, 2014

On 07/07/2014 04:32 PM, Andrius Bentkus wrote:

What OS are you using @tfogal https://github.com/tfogal ?


Reply to this email directly or view it on GitHub
#1215 (comment).

I can confirm it happens on Linux. The operation gets interrupted
(EINTR) but we don't retry.

@tfogal
Copy link
Author

tfogal commented Jul 7, 2014

Yep, me too; I had hit this on a 64bit Ubuntu install.

@bnoordhuis
Copy link
Contributor

FWIW, this isn't EINTR-related, it's a kernel limitation. From strace:

write(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4000000000) = 2147479552

The write succeeds, it just doesn't write out everything. writev() gives the same result as do read() and readv().

For some historical background, this particular Linux behavior goes back to at least the ext2 days, i.e. the mid-90s, that mythical era when Euro house was all the rage and track suits were still fashionable. (Yeah, I don't miss the '90s either.)

To the best of my knowledge it's never been addressed because there was (and maybe still is) significant doubt that the kernel's (v)fs code is 32 bits overflow clean.

I'm a teeny bit ashamed of myself because this factoid has been in my head for years but I didn't stop to consider it - at all - when I wrote fs.c. It should be fairly easy to fix however: there are already fallback loops for the pread() and pwrite() code paths and those can probably be generalized to all read and write operations.

@saghul
Copy link
Contributor

saghul commented Jul 9, 2014

Thanks for the insight Ben!
On Jul 10, 2014 12:35 AM, "Ben Noordhuis" [email protected] wrote:

FWIW, this isn't EINTR-related, it's a kernel limitation. From strace:

write(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4000000000) = 2147479552

The write succeeds, it just doesn't write out everything. writev() gives
the same result as do read() and readv().

For some historical background, this particular Linux behavior goes back
to at least the ext2 days, i.e. the mid-90s, that mythical era when Euro
house was all the rage and track suits were still fashionable. (Yeah, I
don't miss the '90s either.)

To the best of my knowledge it's never been addressed because there was
(and maybe still is) significant doubt that the kernel's (v)fs code is 32
bits overflow clean.

I'm a teeny bit ashamed of myself because this factoid has been in my head
for years but I didn't stop to consider it - at all - when I wrote fs.c. It
should be fairly easy to fix however: there are already fallback loops for
the pread() and pwrite() code paths and those can probably be generalized
to all read and write operations.


Reply to this email directly or view it on GitHub
#1215 (comment).

@bnoordhuis
Copy link
Contributor

Oh, and the reason the system call returns 2147479552 instead of INT_MAX is that the top 4096 numbers are reserved for an in-band mechanism for returning error codes. When the kernel returns ENOENT, it actually returns -ENOENT, i.e. -2 or 0xFFFFFFFE.

I'll get back to my game of nethack now. :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants