Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait4 syscall should return pid (tgid) and not the tid to userspace #3886

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sidkshatriya
Copy link
Contributor

@sidkshatriya sidkshatriya commented Nov 25, 2024

wait4 syscall should return pid (tgid) and not the tid to userspace

This is what the kernel does. Fix emulation to do the same.

In glibc waitpid() i.e. __waitpid() is implemented in terms of wait4() i.e. __wait4() which is a thin wrapper over the wait4 syscall. wait4 syscall returns the pid (tgid in kernel parlance) and not the thread id (tid). tid and pid will, of course, often be different in multithreaded situations. See also man wait4 and man waitpid.

This should fix the unexpected behavior of waitpid() in programs in which the return pid_t is checked against an expected value or used for further bookkeeping/comparisons. In such programs, the failure can be rare because due to thread sequencing, sometimes the tid happens to be numerically the pid that is expected and sometimes it is not.

This is what the kernel does. Fix emulation to do the same.

In glibc waitpid() i.e. __waitpid() is implemented in terms of wait4()
i.e. __wait4() which is a thin wrapper over the wait4 syscall. wait4
syscall returns the pid (tgid in kernel parlance) and not the thread id
(tid). tid and pid will, of course, often be different in multithreaded
situations.  See also `man wait4` and `man waitpid`.

This should fix the unexpected behavior of waitpid() in programs
in which the return pid_t is checked against an expected value or used
for further bookkeeping/comparisons. In such programs, the failure
can be rare because due to thread sequencing, sometimes the tid
happens to be numerically the pid that is expected and sometimes it is
not.
@sidkshatriya
Copy link
Contributor Author

BTW the trigger for my investigation and this PR was that I was getting an assertion failure sometimes at line 39 in the sigcont_threaded test.

// Wait for the child process to have sent itself SIGSTOP
wpid = waitpid(pid, &wait_status, WUNTRACED);
test_assert(wpid == pid);

Now the child_process function in sigcont_threaded.c creates a thread. The reason the assertion was failing was that the waitpid in line 38 was sometimes returning the tid of the thread that was created rather than the tid of the leader. If the tid of the leader was returned then it was equal to pid and the test would pass, otherwise it would fail. Of course, this is wrong because we need to return the process id and not the tid from waitpid().

@sidkshatriya
Copy link
Contributor Author

I am reasonably sure my fix is correct. However, the assumptions behind the code currently on master branch have led to probably some bugs in the emulated ptrace implementation. This has resulted in the ptrace test related failures listed below.

I have not investigated further as ptrace emulation is the most hairy code in the code base ! Any possible fixes could be considered by @rocallahan as necessary.

If the PR looks good then it could be merged and the ptrace emulation code fixed later also...

	591 - ptrace_trace_clone (Failed)
	592 - ptrace_trace_clone-no-syscallbuf (Failed)
	601 - ptracer_death_multithread (Failed)
	602 - ptracer_death_multithread-no-syscallbuf (Failed)
	603 - ptracer_death_multithread_peer (Failed)
	604 - ptracer_death_multithread_peer-no-syscallbuf (Failed)
	2137 - ptrace_trace_clone-32 (Failed)
	2138 - ptrace_trace_clone-32-no-syscallbuf (Failed)
	2147 - ptracer_death_multithread-32 (Failed)
	2148 - ptracer_death_multithread-32-no-syscallbuf (Failed)
	2149 - ptracer_death_multithread_peer-32 (Failed)
	2150 - ptracer_death_multithread_peer-32-no-syscallbuf (Failed)

@sidkshatriya
Copy link
Contributor Author

(I'm checking to see if I may be wrong about this PR -- I'll report back if I revise my conclusions or agree with them)

@sidkshatriya
Copy link
Contributor Author

sidkshatriya commented Nov 26, 2024

Something is wrong currently with rr, however, I'm now not sure that the PR I posted is the correct fix.

I am trying to debug things again. Perhaps @KJTsanaktsidis who made sigcont_threaded could shed some light on the behavior of this test -- I would really appreciate that :-) !

Setup:
In the sigcont_threaded test Process A forks into Process B. Process B creates a thread C. Thread C sends SIGSTOP to itself. When the SIGSTOP happens the following debug messages are seen:

[RecordTask] setting 4059197 to GROUP_STOP due to signal 19
[RecordTask] Sending synthetic SIGCHLD to waiting tid 4058931
[RecordTask] setting 4059195 to GROUP_STOP due to signal 19
[RecordTask] Sending synthetic SIGCHLD to waiting tid 4058931

Here Process B is 4059195 and Process C is 4059197 and Process A is 4058931. Two things surprised me here:

(1) Process C is a thread, so it should not be sending signals on state changes. In fact when the clone syscall is issued to create any thread, the signal to send on termination is usually set to 0. No SIGCHLD is ever issued on termination of Process C so it seems strange that on a SIGSTOP, a SIGCHLD is being sent....

(2) The "real_parent" and "parent" (in kernel parlance) of Process C is Process B. Any synthetic signal if issued by rr due to state changes in C should go to B ! Here both the SIGCHLDs are being sent to A. The SIGCHLD being sent to A due SIGSTOP in B is correct but I don't agree with the synthetic SIGCHLD being sent to A due to a state change in C !

@KJTsanaktsidis
Copy link
Contributor

Hmmm. I added a bit more logging into that test and pushed it up here: master...KJTsanaktsidis:rr:ktsanaktsidis/more_logs. Some observations:

If you run this test just by itself (i.e. not under rr at all)...

kjtsanaktsidis@kjtsanaktsidis-laptop build % ./bin/sigcont_threaded                 
Parent process is 193619
Child process is 193620
Child thread is 193621
Child thread stopping
process 193619 got SIGCHLD for pid 193620
parent process first waitpid result 193620
parent process continuing
Child thread resumed
process 193619 got SIGCHLD for pid 193620
Child thread joined
process 193619 got SIGCHLD for pid 193620
parent process second waitpid result 193620
EXIT-SUCCESS

The parent process receives one SIGCHLD for each state change in the child (once when it's stopped, once when it's resumed, and once when it's exited). Both the waitpid result and the si->si_pid in the signal are set to the tgid of the child, with the child thread's tid not visible to the parent. I think this is what we expect to happen?

If you run it under rr, something strange happens:

kjtsanaktsidis@kjtsanaktsidis-laptop build % ./bin/rr record -- ./bin/sigcont_threaded
rr: Saving execution to trace directory `/home/kjtsanaktsidis/.local/share/rr/sigcont_threaded-53'.
Parent process is 193737
Child process is 193738
Child thread is 193739
Child thread stopping
process 193737 got SIGCHLD for pid 193738
process 193737 got SIGCHLD for pid 193738
parent process first waitpid result 193738
parent process continuing
Child thread resumed
Child thread joined
process 193737 got SIGCHLD for pid 193738
parent process second waitpid result 193738
EXIT-SUCCESS

Like before, the parent sees only the tgid, not the tids; BUT, somehow it got two SIGCHLDs when the child was stopped, and zero when it was resumed! Actually, the stop signal is sent to the parent once per thread (you can see this if you add a second pthread_create to child_process; you get three SIGCHLDs on stop).

I think your patch is not needed, because syscall_state.emulate_wait_for_child above will only be set to the thread-group leader of the child, because of the conditions in is_waiting_for (

bool RecordTask::is_waiting_for(RecordTask* t) {
) as called by maybe_emulate_wait (
if (!rchild->emulated_stop_pending || !t->is_waiting_for(rchild)) {
).

But nonetheless you did observe waitpid returning the tid and not the tgid in some circumstances, so the logic in maybe_emulate_wait/is_waiting_for must be faulty in some circumstance.


So there are at least three bugs in rr I think:

  1. It's sending the GROUP_STOP notification once per thread, not once per tgid
  2. It's not sending the GROUP_START notification at all
  3. wait4 sometimes returns the tid of a thread that caused a SIGSTOP notification, and not the tgid.

I have a feeling fixing no. 1 would also fix 3, but I don't really know. I won't be able to have a deeper look at this until the weekend, probably, but if you investigate further please keep me updated!

@sidkshatriya
Copy link
Contributor Author

Thank you for this most useful investigation ! It has given me some ideas for future debugging and root cause analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants