Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSan] futex syscalls are not marked as blocking, causing hangs on TSan builds when combined with async signals #123138

Open
canova opened this issue Mar 27, 2024 · 8 comments
Labels
A-sanitizers Area: Sanitizers for correctness and code quality C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@canova
Copy link
Contributor

canova commented Mar 27, 2024

I've been investigating the Firefox Profiler hangs that were happening in TSan builds, and discovered that LLVM doesn't correctly mark some blocking functions calls (llvm/llvm-project#83561 and llvm/llvm-project#83844). And it was failing to send async signals like SIGPROF properly.

There is also a Rust test case in the second issue if you want to try it yourself: https://github.com/canova/rustc-tsan-testcase

They've been resolved now, but Rust needs to get the LLVM patches as well and on top of it it needs to do a manual handling for FUTEX_WAIT syscalls.

It looks like for calling syscalls directly, there isn't a way to intercept the call (it's possible for glibc calls like pthread_mutex_lock etc.). That's why when a syscall is being called, the program itself has to inject pre/post syscall hooks itself. My PR added these new hooks for futex syscalls so they can be marked as blocking properly. Rust std library also calls FUTEX_WAIT syscalls directly, which means that we need to add __sanitizer_syscall_pre_futex and __sanitizer_syscall_post_futex to here (beware that these are macros, so you'll want to cal the functions that the macros call instead):

// Use FUTEX_WAIT_BITSET rather than FUTEX_WAIT to be able to give an
// absolute time rather than a relative time.
libc::syscall(
libc::SYS_futex,
futex as *const AtomicU32,
libc::FUTEX_WAIT_BITSET | libc::FUTEX_PRIVATE_FLAG,
expected,
timespec.as_ref().map_or(null(), |t| t as *const libc::timespec),
null::<u32>(), // This argument is unused for FUTEX_WAIT_BITSET.
!0u32, // A full bitmask, to make it behave like a regular FUTEX_WAIT.
)

I tried implementing it, but I couldn't make the linker happy yet. It can't find the hooks, they should be inside the libclang_rt but I can't seem to make it work. Any help would be appreciated!

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 27, 2024
@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-sanitizers Area: Sanitizers for correctness and code quality C-bug Category: This is a bug. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 27, 2024
@Noratrieb
Copy link
Member

Are you correctly linking against your newly patched runtime?

@canova
Copy link
Contributor Author

canova commented Mar 27, 2024

Are you correctly linking against your newly patched runtime?

I think so. I tried calling a preexisting function too just to check but that didn't work either.

I'm not 100% sure if I'm not making an obvious mistake while trying to link though. I tried a combination of various things but not luck so far. This code only compiles when I build with tsan enabled (), which makes me think that the compiler-rt is not linked for the normal builds?

@Noratrieb
Copy link
Member

You need to #[cfg] the call such that it is only enabled in TSan builds. There should be an (unstable but usable in std) cfg for sanitizers.

@canova
Copy link
Contributor Author

canova commented Mar 27, 2024

Oh, right. Actually that seems to match the behavior of clang: https://godbolt.org/z/K9E347ecn It also fails to link when tsan is not enabled. Let me try the sanitize cfg. Thanks!

@canova
Copy link
Contributor Author

canova commented Mar 28, 2024

@Nilstrieb I can verify that the linker successfully finds them when they are behind a sanitize cfg. My test case is also fixed with the patched LLVM + calling these pre/post hooks.

Would be happy to send a PR, but to be able to do that we need to update the LLVM fork first. Do you know if there is a process to do that? (either as a cherry-pick or some other way like updating the whole fork?) Or if that's something I can do?

@Noratrieb
Copy link
Member

We support being compiled with LLVM that's not our fork and like 1 version old, so you'd either need to wait for like a year or find a way to only apply this change when possible (weak linkage maybe?).

@the8472
Copy link
Member

the8472 commented Apr 10, 2024

It looks like for calling syscalls directly, there isn't a way to intercept the call (it's possible for glibc calls like pthread_mutex_lock etc.).

They're not direct syscalls via assembly, they still go through the libc helper. Can't that be intercepted too?

@briansmith
Copy link
Contributor

It looks like for calling syscalls directly, there isn't a way to intercept the call (it's possible for glibc calls like pthread_mutex_lock etc.).

They're not direct syscalls via assembly, they still go through the libc helper. Can't that be intercepted too?

Based on my experience and investigation in rust-random/getrandom#463, libc::syscall does not integrate with memory sanitizers. Instead, each of the syscall wrappers, the functions open(), getrandom() does the integration. In theory it would be possible for libc::syscall to do it, but it doesn't, probably because it would make libc::syscall huge due to a giant switch statement of every possible syscall.

Although the getrandom PR I link to above is specifically about Memory Sanitizer, an analogous investigation will reveal that similar work would be needed for TSan. Luckily it is pretty simple to replicate what libc is doing by using the sanitizer APIs, as I demonstrated in the Memory Sanitizer case for getrandom in the PR above.

(libstd doesn't need to call __msan_unpoison on the result of its own libc::syscall(SYS_getrandom, ...) usage because it passes a &mut [u8]) to it, which means it's already been initialized before the syscall.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sanitizers Area: Sanitizers for correctness and code quality C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants