-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign syscallbuf to always unwind on interruption #3322
base: master
Are you sure you want to change the base?
Conversation
Note that this is just the x86_64/i686 version of this. Aarch64 merged while I was working on this, so that's not included yet, but should be reasonably straightforward. There's also a little bit of cleanup still to do, but this should work 100% otherwise, so if there's any concerns about the details, we can discuss them now. |
562ca4c
to
7eb33ab
Compare
Alright, I think this is working now everywhere. @rocallahan @yuyichao, please take a look. As I said, there's a fair bit of additional code deletion that could be done as a result of this, but I think we can do that in a follow up PR. |
|
||
// If the function requested the bail path, rewrite the return address | ||
ldr x0, [sp, 688] | ||
add x0, x0, 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we’d better not be using return address signing here…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is overwriting the parent frame right? Might be good to add a comment below around the stack frame layout to document this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we’d better not be using return address signing here…
Yeah, that'd probably need some other scheme
And this is overwriting the parent frame right? Might be good to add a comment below around the stack frame layout to document this assumption.
Yes, it's overwriting the return address in the parent frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that'd probably need some other scheme
I think just unsign and resign the printer would suffice. It was just funny that this is basically exactly a ROP attack on the caller...
* b .Lreturn | ||
.Lbail: | ||
* ldp x15, x30, [x15] | ||
** // Safe suffix starts here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update the comment below since it’s not just used for the test anymore. It might make real syscall.
src/Monkeypatcher.cc
Outdated
return remote_code_ptr(bp.as_int()); | ||
patched_syscall *ps = &syscall_stub_list[it->second]; | ||
auto bp = it->first + ps->size - ps->safe_suffix; | ||
if (pp == bp - 4 || pp == bp - 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I assume we won’t be expecting the exit breakpoint when we are in the bail path? (Otherwise this misses the bail path return)
- Is the breakpoint address returned here correct? It seems that it is pointing to the data part? (Safe suffix is larger than 8. I assume you meant to use the last instruction which should be - 12 due to the return address I stuffed at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The test failure that required this happens in cont_race about 1-3% of the time iirc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I wasn't entirely sure this breakpoint is actually still required, so I'll need to look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breakpoint is needed if we got a signal when we are in the return path of the stub and decided to defer it until the syscallbuf business is done. This would not be needed if we are grantee to not get a signal (comes to think about it I don't think we can grantee that for external signals) or if we just (manually or actually) single step the instructions out. It seems that we may need this on x86 as well now? Or at least a safe suffix so that we can deliver signal when we are back in the stub.
I think probably we should make a new rr release before we land this ... I expect this will destabilize things for a while. |
That's probably a good idea. |
This is a major redesign of the syscallbuf code with the goal of establishing the invariant that we never switch away from a tracee while it's in the syscallbuf code. Instead, we unwind the syscallbuf code completely and execute the syscall at a special syscall instruction now placed in the extended jump patch. The primary motivation for this that this fixes #3285, but I think the change is overall very beneficial. We have significant complexity in the recorder to deal with the possibility of interrupting the tracee during the syscallbuf. This commit does not yet remove this complexity (the change is already very big), but that should be easy to do as a follow up. Additionally, we used to be unable to perform syscall buffering for syscalls performed inside a signal handler that interrupted a syscall. This had performance implications on use cases like stack walkers, which often perform multiple memory-probing system calls for every frame to deal with the possibility of invalid unwind info. There are many details here, but here's a high level overview. The layout of the new extended jump patch is: ``` <stack setup> call <syscallbuf_hook> // Bail path returns here <stack restore> syscall <code from the original patch site> // Non-bail path returns here. jmp return_addr ``` One detail worth mentioning is what happens if a signal gets delivered once the tracee is out of the syscallbuf, but still in the extended jump patch (i.e. after the stack restore). In this case, rr will rewrite the ip of the signal frame to point to the equivalent ip in the original, now patched code section. Of course the instructions in question are no longer there, but the CFI will nevertheless be generally accurate for the current register state (excluding weird CFI that explicitly references the ip of course). This allows unwinders in the end-user-application to never have to unwind through any frame in the rr syscallbuf, which seems like a desirable property. Of course, `sigreturn` must perform the opposite transformation to avoid actually returning into a patched-out location. The main drawback of this scheme is that while the application will never see a location without CFI, GDB does still lack unwind information in the extended jump stub. This is not a new problem, but syscall events are now in the extended jump stub, so they come up quite frequently. I don't think this is a huge problem - it's basically the same situation we used to have before the vdso changes. I believe the best way to fix this would be to establish some way of having rr inform gdb of its jump patches (in fact gdb already has this kind of mechanism for tracepoints, it's just not exposed for tracepoints initiated by the gdb server), but I don't intend to do this myself anytime in the near future. That said, I should note that doing this would not require any changes on the record side, so could be done anytime and start working retroactively for already recorded traces.
This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.
7eb33ab
to
d332b52
Compare
Rebased. Shall we go ahead with the release on master then? FWIW, I've used this a bit so far and have so far not found any new issues that came down to this change, but still an intermediate release is probably prudent. |
d332b52
to
d6dc859
Compare
Let's move forward with this! Can you rebase it? I promise I'll review it :-) |
I'm still interested in this change. I ran into "Expected syscall_bp_vm to be clear" today while debugging a python process. |
This is a major redesign of the syscallbuf code with the goal of
establishing the invariant that we never switch away from a tracee
while it's in the syscallbuf code. Instead, we unwind the syscallbuf
code completely and execute the syscall at a special syscall instruction
now placed in the extended jump patch.
The primary motivation for this that this fixes #3285, but I think
the change is overall very beneficial. We have significant complexity
in the recorder to deal with the possibility of interrupting the tracee
during the syscallbuf. This commit does not yet remove this complexity
(the change is already very big), but that should be easy to do as a
follow up. Additionally, we used to be unable to perform syscall buffering
for syscalls performed inside a signal handler that interrupted a
syscall. This had performance implications on use cases like stack walkers,
which often perform multiple memory-probing system calls for every frame to
deal with the possibility of invalid unwind info.
There are many details here, but here's a high level overview. The layout of
the new extended jump patch is:
One detail worth mentioning is what happens if a signal gets delivered once
the tracee is out of the syscallbuf, but still in the extended jump patch
(i.e. after the stack restore). In this case, rr will rewrite the ip of the
signal frame to point to the equivalent ip in the original, now patched code
section. Of course the instructions in question are no longer there, but
the CFI will nevertheless be generally accurate for the current register state
(excluding weird CFI that explicitly references the ip of course).
This allows unwinders in the end-user-application to never have to unwind
through any frame in the rr syscallbuf, which seems like a desirable
property. Of course,
sigreturn
must perform the opposite transformationto avoid actually returning into a patched-out location.
The main drawback of this scheme is that while the application will never
see a location without CFI, GDB does still lack unwind information in
the extended jump stub. This is not a new problem, but syscall events
are now in the extended jump stub, so they come up quite frequently.
I don't think this is a huge problem - it's basically the same situation
we used to have before the vdso changes. I believe the best way to fix
this would be to establish some way of having rr inform gdb of its
jump patches (in fact gdb already has this kind of mechanism for tracepoints,
it's just not exposed for tracepoints initiated by the gdb server),
but I don't intend to do this myself anytime in the near future.
That said, I should note that doing this would not require any changes
on the record side, so could be done anytime and start working retroactively
for already recorded traces.