Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Large File Support #1068

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thesamesam
Copy link
Contributor

This enables 64-bit off_t where it's opt-in (e.g. glibc) on 32-bit platforms.

Bug: https://bugs.gentoo.org/911176

@guyharris
Copy link
Member

So is this because, even if a program uses another library, already configured with large file support, to do some of its file reading and writing, the program itself must also be configured with large file support, or is it for reading or writing other files?

I.e., this shouldn't be necessary for reading and writing large capture files, if libpcap is compiled correctly, but it may be necessary if you're running tcpdump without -w and you print more than the OS's default file size/offset limit worth of dissected packet information.

@guyharris
Copy link
Member

This won't fix the problem for people using CMake; see cmake/Modules/FindLFS.cmake, and the "Large file support on UN*X, a/k/a LFS." section of CMakeLists.txt, in the libpcap source for the sort of changes that should be made to the tcpdump source.

@thesamesam
Copy link
Contributor Author

thesamesam commented Jul 27, 2023

So is this because, even if a program uses another library, already configured with large file support, to do some of its file reading and writing, the program itself must also be configured with large file support, or is it for reading or writing other files?

Right, it's not infectious (unless libpcap stores it in one of its installed headers, and AFAICT it doesn't) , so you definitely need it for the latter case - I haven't tried to induce the former case though.

It came up in the past for Windows at faf8fb7, although it wouldn't be helped by this PR for MSVC.

This won't fix the problem for people using CMake; see cmake/Modules/FindLFS.cmake, and the "Large file support on UN*X, a/k/a LFS." section of CMakeLists.txt, in the libpcap source for the sort of changes that should be made to the tcpdump source.

Yeah, I wish CMake had a built-in macro for this. IIRC we asked in the past and it didn't go anywhere. I'll take a look, thanks for the pointer.

@thesamesam thesamesam marked this pull request as draft July 27, 2023 07:48
This enables 64-bit off_t where it's opt-in (e.g. glibc) on 32-bit platforms.

Crib the CMake module from libpcap as well to handle CMake, while we use
AC_SYS_LARGEFILE for autotools.

Bug: https://bugs.gentoo.org/911176
Signed-off-by: Sam James <[email protected]>
@guyharris
Copy link
Member

Right, it's not infectious (unless libpcap stores it in one of its installed headers, and AFAICT it doesn't)

If it did store stuff therer, it would presumably have infected tcpdump, so that tcpdump wouldn't require large file support to be configured. And. no, it doesn't store configuration options; storing configuration options in installed headers is something that should be done Very Carefully, so that nothing that depends on, for example, the compiler is stored there, because somebody might compile a program using libpcap with a compiler other than the one used to compile libpcap, and that should be allowed to work if at all possible. (That's why we do compiler tests in the header file with #ifdef/#if, for example.)

so you definitely need it for the latter case - I haven't tried to induce the former case though.

I might have tested it on a 32-bit Linux VM when large file support was added to libpcap, but it was long ago, so I don't remember for sure.

(The platform on which I do most of my development is a 4.4-Lite-based BSD that I call "CupertinoBSD" - from the name, you can probably guess what I mean, but if you need a hint, the underlying BSD-and-Mach-based platform atop which the OS is built is named after the author of "On the Origin of Species" - and, being 4.4-Lite-based, it's a you-don't-get-to-opt platform on which off_t is 64-bit regardless of the size of int or long and you don't get to change it, so even when my development machine was 32-bit libpcap and everything else had large file support.)

It came up in the past for Windows at faf8fb7, although it wouldn't be helped by this PR for MSVC.

The issues are different on Windows - and -f wouldn't work on 32-bit UN*X, either, with a large file, because you couldn't read the entire filter file in with read() as the length argument to read(), being a size_t, would be 32 bits on an ILP32 platform, and it wouldn't even work with file sizes > 2^31-1, because the return value of read(), being an ssize_t, would be a signed 32-bit value on an ILP32 platform.

So faf8fb7 was intended to fix the problem on all platforms. It still doesn't prevent somebody from handing a file to -f that's not a "large file" but is large enough that it can just barely allocate a buffer big enough for the file but then fails to allocate something else after that when, for example, compiling the filter, because the buffer is so big there's no much room left in {the address space,memory+swap space,etc.}.

I don't think Windows would have an issue of writing more than 2^32 bytes to the standard output, even on a 32-bit platform, or would even have it on a 64-bit platform (64-bit Windows is LLP64, so long is 32 bits) - i.e., they didn't do the "protect non-LFS-aware programs" stuff that caused LFS support to be opt-in on some platforms - so I don't think anything additional is needed unless there are stat() calls in tcpdump that would need to be made to call _fstati64() rather than _fstat(), or lseek() calls that would similarly need to be changed, etc.. I'll check into that.

@guyharris
Copy link
Member

unless there are stat() calls in tcpdump that would need to be made to call _fstati64() rather than _fstat(),

We already handle those similarly to how libpcap handles them - but that means that we need large file support on UN*X in order to determine, in the -f code, how big a large file is, so that requires this fix.

or lseek() calls that would similarly need to be changed, etc.. I'll check into that.

Not a problem - unless you're on a 32-bit UN*X with an older libpcap that doesn't have pcap_dump_ftell64(). That's a potential problem, as large file support was added to libpcap back in early 2008 with a9b98caa0d13c692f4ff08df7116276542471be5 but pcap_dump_ftell64() wasn't added until late 2017 with 8bac534951f44d0a3e3e7951c79c95d587fc234b.

Fixing that won't help with older libpcaps without large file support, but you can't fix that in tcpdump. It should, however, work with libpcaps that have large file support but don't have pcap_dump_ftell64().

I'll look at that.

@guyharris
Copy link
Member

And the fix won't mess up tcpdump with one of those no-large-file-support libpcaps, as you won't get a file offset > 2^31-1, even though the call will return a 64-bit file offset, so the change breaks nothing but, as noted, doesn't help anything either in that case.

@thesamesam
Copy link
Contributor Author

Right, it's not infectious (unless libpcap stores it in one of its installed headers, and AFAICT it doesn't)

If it did store stuff therer, it would presumably have infected tcpdump, so that tcpdump wouldn't require large file support to be configured. And. no, it doesn't store configuration options; storing configuration options in installed headers is something that should be done Very Carefully, so that nothing that depends on, for example, the compiler is stored there, because somebody might compile a program using libpcap with a compiler other than the one used to compile libpcap, and that should be allowed to work if at all possible. (That's why we do compiler tests in the header file with #ifdef/#if, for example.)

Yeah, we're agreeing (I was just pointing out how it's not infectious). This is one of those things where it almost always causes more headaches than it solves it to persist like that. In fact, just the other day, I was dealing with something like that from Perl itself.

so you definitely need it for the latter case - I haven't tried to induce the former case though.

I might have tested it on a 32-bit Linux VM when large file support was added to libpcap, but it was long ago, so I don't remember for sure.

(The platform on which I do most of my development is a 4.4-Lite-based BSD that I call "CupertinoBSD" - from the name, you can probably guess what I mean, but if you need a hint, the underlying BSD-and-Mach-based platform atop which the OS is built is named after the author of "On the Origin of Species" - and, being 4.4-Lite-based, it's a you-don't-get-to-opt platform on which off_t is 64-bit regardless of the size of int or long and you don't get to change it, so even when my development machine was 32-bit libpcap and everything else had large file support.)

😄

I've got an 32-bit x86 chroot around just for this nowadays.

It came up in the past for Windows at faf8fb7, although it wouldn't be helped by this PR for MSVC.

The issues are different on Windows - and -f wouldn't work on 32-bit UN*X, either, with a large file, because you couldn't read the entire filter file in with read() as the length argument to read(), being a size_t, would be 32 bits on an ILP32 platform, and it wouldn't even work with file sizes > 2^31-1, because the return value of read(), being an ssize_t, would be a signed 32-bit value on an ILP32 platform.

Fair point! I'd forgot about that.

I don't think Windows would have an issue of writing more than 2^32 bytes to the standard output, even on a 32-bit platform, or would even have it on a 64-bit platform (64-bit Windows is LLP64, so long is 32 bits) - i.e., they didn't do the "protect non-LFS-aware programs" stuff that caused LFS support to be opt-in on some platforms - so I don't think anything additional is needed unless there are stat() calls in tcpdump that would need to be made to call _fstati64() rather than _fstat(), or lseek() calls that would similarly need to be changed, etc.. I'll check into that.

I'm afraid I've got no idea for Windows there, but I'd be very interested in the answer.

@thesamesam thesamesam changed the title configure.ac: use AC_SYS_LARGEFILE Enable Large File Support Jul 27, 2023
@@ -0,0 +1,153 @@
# CMake support for large files
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want a note here that it's from libpcap and hence changes should be made there first, etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but then the equivalent code in libpcap should be marked as "this was picked up by tcpdump and changes should be made there, too".

Or:

  • in tcpdump, say "all changes to this should be propagated to the same file in libpcap";
  • in libpcap, say "all changes to this should be propagated to the same file in tcpdump".

(And then we should look at tcpslice; that does a bunch of seeking, so it may involve even more work there.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants