-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching to using memfd for input data #990
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #990 +/- ##
==========================================
+ Coverage 82.37% 82.59% +0.21%
==========================================
Files 143 146 +3
Lines 5442 5675 +233
==========================================
+ Hits 4483 4687 +204
- Misses 959 988 +29 ☔ View full report in Codecov by Sentry. |
e30f7b1
to
66437a7
Compare
dmoj/cptbox/utils.py
Outdated
super().__init__(memory_fd_create(), 'r+') | ||
_name: Optional[str] = None | ||
|
||
def __init__(self, prefill: Optional[bytes] = None, seal=False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one or both of these should be required kwargs. I'm thinking the second should. What is the difference between prefilling with nothing, and passing None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made keyword arguments required.
dmoj/cptbox/utils.py
Outdated
if e.errno == errno.ENOSYS: | ||
# FreeBSD | ||
self.seek(0, os.SEEK_SET) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what this does. Does it deserve more of a comment?
@@ -306,7 +306,7 @@ int memory_fd_create(void) { | |||
#ifdef __FreeBSD__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is this function called on FreeBSD anymore? Are you creating the tempfile in Python instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this around for now, since I'd rather this function work on all platforms, as the detection logic for the FreeBSD case is now different. If FreeBSD implements /proc/[pid]/fd
some day, this will magically work.
dmoj/cptbox/utils.py
Outdated
try: | ||
os.dup2(new_fd, fd) | ||
finally: | ||
os.close(new_fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use a comment about why this dup is needed. Also, why isn't it implemented in the C function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C code is just that much more painful to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the permissions of the backing fd from RW to RO. A comment seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factored it to a function with an obvious name.
@@ -21,10 +21,10 @@ def get_fs(self): | |||
def get_allowed_syscalls(self): | |||
return super().get_allowed_syscalls() + ['fork', 'waitpid', 'wait4'] | |||
|
|||
def get_security(self, launch_kwargs=None): | |||
def get_security(self, launch_kwargs=None, extra_fs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that this is a direct result of this PR, but maybe this should be converted to **kwargs
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that later.
8aa122f
to
a903955
Compare
a903955
to
3111bc5
Compare
3111bc5
to
7a9da0e
Compare
dmoj/cptbox/utils.py
Outdated
_name: Optional[str] = None | ||
|
||
def __init__(self, prefill: Optional[bytes] = None, seal=False) -> None: | ||
if FREEBSD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we either want to branch on some sort of "expected size" here, or figure out something to make the memory backing this file be more likely to be swapped out.
If the input is 5 GiB large, we don't want to buffer all of it in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this up to the caller for now. If the caller desires, it can use NamedFileIO
.
dmoj/cptbox/utils.py
Outdated
self._name = f.name | ||
super().__init__(os.dup(f.fileno()), 'r+') | ||
else: | ||
super().__init__(memory_fd_create(), 'r+') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these branches should be a try
for memory_fd_create
, and a fallback for using a disk-based file. Feels more straightforward than special-casing FreeBSD here (and only here; seal
is implemented in an OS-agnostic way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to have both NamedFileIO
and MemoryIO
, with the latter being an alias of the former on FreeBSD.
dmoj/cptbox/utils.py
Outdated
try: | ||
os.dup2(new_fd, fd) | ||
finally: | ||
os.close(new_fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the permissions of the backing fd from RW to RO. A comment seems fine.
dmoj/cptbox/utils.py
Outdated
def close(self) -> None: | ||
super().close() | ||
if self._name: | ||
os.unlink(self._name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that we could do this up-front (don't pass delete=False
). It's still possible to write to an orphaned file. We can drop self._name
and have to_path
generate the procfs path unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resulting file can't be reopened from another process on FreeBSD. Made this more explicit in the new version.
dmoj/cptbox/utils.py
Outdated
|
||
from dmoj.cptbox._cptbox import memory_fd_create, memory_fd_seal | ||
from dmoj.cptbox.tracer import FREEBSD | ||
|
||
|
||
class MemoryIO(io.FileIO): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where/how do we end up deleting the memory backing these objects once they're no longer necessary to grade a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's closed when MemoryIO.close
is called or it's garbage collected, since FileIO
constructor takes ownership of the file descriptor unless closefd=False
is passed.
023b22b
to
ea93c96
Compare
|
||
def to_bytes(self) -> bytes: | ||
try: | ||
with mmap.mmap(self.fileno(), 0, access=mmap.ACCESS_READ) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we expect this will be called? Should we madvise(..., MADV_SEQUENTIAL)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very often, it's mostly for compatibility with old checkers etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like "very often" to me, but happy to punt on this. I worry we'll hit issues with gigabyte-sized generator inputs that also have checkers, since this doubles the memory requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less of a problem than it looks. In the standard grader, we pass this magic to checkers: judge_input=LazyBytes(case.input_data)
. We only pay for this if the checker actually reads judge_input
.
ea93c96
to
17a551c
Compare
17a551c
to
52948c7
Compare
0ad1b83
to
77bf381
Compare
Breaking changes:
_interact_with_process
This closes #835.