Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching to using memfd for input data #990

Merged
merged 9 commits into from
Dec 30, 2024
Merged

Switching to using memfd for input data #990

merged 9 commits into from
Dec 30, 2024

Conversation

quantum5
Copy link
Member

@quantum5 quantum5 commented Jan 30, 2022

Breaking changes:

  • All interactors using _interact_with_process

This closes #835.

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2022

Codecov Report

Attention: Patch coverage is 87.79528% with 31 lines in your changes missing coverage. Please review.

Project coverage is 82.59%. Comparing base (b199763) to head (709c6db).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
dmoj/cptbox/lazy_bytes.py 69.23% 16 Missing ⚠️
dmoj/cptbox/utils.py 87.34% 10 Missing ⚠️
dmoj/cptbox/isolate.py 0.00% 2 Missing ⚠️
dmoj/executors/shell_executor.py 0.00% 2 Missing ⚠️
dmoj/problem.py 97.36% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #990      +/-   ##
==========================================
+ Coverage   82.37%   82.59%   +0.21%     
==========================================
  Files         143      146       +3     
  Lines        5442     5675     +233     
==========================================
+ Hits         4483     4687     +204     
- Misses        959      988      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@quantum5 quantum5 force-pushed the memfd-judge branch 11 times, most recently from e30f7b1 to 66437a7 Compare January 30, 2022 21:11
@quantum5 quantum5 marked this pull request as ready for review January 30, 2022 21:20
super().__init__(memory_fd_create(), 'r+')
_name: Optional[str] = None

def __init__(self, prefill: Optional[bytes] = None, seal=False) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one or both of these should be required kwargs. I'm thinking the second should. What is the difference between prefilling with nothing, and passing None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made keyword arguments required.

dmoj/cptbox/utils.py Outdated Show resolved Hide resolved
if e.errno == errno.ENOSYS:
# FreeBSD
self.seek(0, os.SEEK_SET)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what this does. Does it deserve more of a comment?

dmoj/problem.py Outdated Show resolved Hide resolved
@@ -306,7 +306,7 @@ int memory_fd_create(void) {
#ifdef __FreeBSD__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this function called on FreeBSD anymore? Are you creating the tempfile in Python instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this around for now, since I'd rather this function work on all platforms, as the detection logic for the FreeBSD case is now different. If FreeBSD implements /proc/[pid]/fd some day, this will magically work.

try:
os.dup2(new_fd, fd)
finally:
os.close(new_fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use a comment about why this dup is needed. Also, why isn't it implemented in the C function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C code is just that much more painful to maintain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the permissions of the backing fd from RW to RO. A comment seems fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored it to a function with an obvious name.

@@ -21,10 +21,10 @@ def get_fs(self):
def get_allowed_syscalls(self):
return super().get_allowed_syscalls() + ['fork', 'waitpid', 'wait4']

def get_security(self, launch_kwargs=None):
def get_security(self, launch_kwargs=None, extra_fs=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this is a direct result of this PR, but maybe this should be converted to **kwargs instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that later.

@quantum5 quantum5 force-pushed the memfd-judge branch 2 times, most recently from 8aa122f to a903955 Compare December 28, 2024 04:23
_name: Optional[str] = None

def __init__(self, prefill: Optional[bytes] = None, seal=False) -> None:
if FREEBSD:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we either want to branch on some sort of "expected size" here, or figure out something to make the memory backing this file be more likely to be swapped out.

If the input is 5 GiB large, we don't want to buffer all of it in memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this up to the caller for now. If the caller desires, it can use NamedFileIO.

self._name = f.name
super().__init__(os.dup(f.fileno()), 'r+')
else:
super().__init__(memory_fd_create(), 'r+')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these branches should be a try for memory_fd_create, and a fallback for using a disk-based file. Feels more straightforward than special-casing FreeBSD here (and only here; seal is implemented in an OS-agnostic way).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to have both NamedFileIO and MemoryIO, with the latter being an alias of the former on FreeBSD.

try:
os.dup2(new_fd, fd)
finally:
os.close(new_fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the permissions of the backing fd from RW to RO. A comment seems fine.

def close(self) -> None:
super().close()
if self._name:
os.unlink(self._name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that we could do this up-front (don't pass delete=False). It's still possible to write to an orphaned file. We can drop self._name and have to_path generate the procfs path unconditionally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resulting file can't be reopened from another process on FreeBSD. Made this more explicit in the new version.


from dmoj.cptbox._cptbox import memory_fd_create, memory_fd_seal
from dmoj.cptbox.tracer import FREEBSD


class MemoryIO(io.FileIO):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how do we end up deleting the memory backing these objects once they're no longer necessary to grade a case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's closed when MemoryIO.close is called or it's garbage collected, since FileIO constructor takes ownership of the file descriptor unless closefd=False is passed.

@quantum5 quantum5 force-pushed the memfd-judge branch 3 times, most recently from 023b22b to ea93c96 Compare December 30, 2024 03:27
dmoj/cptbox/utils.py Show resolved Hide resolved

def to_bytes(self) -> bytes:
try:
with mmap.mmap(self.fileno(), 0, access=mmap.ACCESS_READ) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we expect this will be called? Should we madvise(..., MADV_SEQUENTIAL) here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very often, it's mostly for compatibility with old checkers etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like "very often" to me, but happy to punt on this. I worry we'll hit issues with gigabyte-sized generator inputs that also have checkers, since this doubles the memory requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less of a problem than it looks. In the standard grader, we pass this magic to checkers: judge_input=LazyBytes(case.input_data). We only pay for this if the checker actually reads judge_input.

@quantum5 quantum5 merged commit 315ede5 into master Dec 30, 2024
19 checks passed
@quantum5 quantum5 deleted the memfd-judge branch December 30, 2024 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use memfd for file data
4 participants