Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT sometimes occurs when testing libloading code #41

Open
binkoni opened this issue Mar 27, 2018 · 16 comments
Open

SEGFAULT sometimes occurs when testing libloading code #41

binkoni opened this issue Mar 27, 2018 · 16 comments

Comments

@binkoni
Copy link

binkoni commented Mar 27, 2018

Operating system is Fedora 27.

uname -r
4.15.4-300.fc27.x86_64

When running cargo test multiple times, sometimes test fails raising a segmentation fault.

#[test]
fn libloading() {
    let lib = libloading::Library::new("libdltest.so").unwrap();
    unsafe {
        let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
        assert!(test_fn(100) == 200);
    }
}
test test::libloading ... error: process didn't exit successfully: `/home/User/test_libloading/target/debug/deps/test_libloading-b3311c80df7834fd libloading` (signal: 11, SIGSEGV: invalid memory reference)

But when this code runs in an executable crate, It seems to run without the segfault even when I execute it multiple times.

fn main() {
    let lib = libloading::Library::new("libdltest.so").unwrap();
    unsafe {
        let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
        assert!(test_fn(100) == 200);
    }
}
@nagisa
Copy link
Owner

nagisa commented Mar 27, 2018 via email

@binkoni
Copy link
Author

binkoni commented Mar 28, 2018

I tried debugging using gdb but gdb was unable to get symbols from the test executable for unknown reason.
Instead I used valgrind and It looks like that the segfault was caused by some kind of thread safety violation.
Here is the output :

==7268== Memcheck, a memory error detector
==7268== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7268== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7268== Command: ./test_libloading-8b92f602ec17e25e
==7268== 
==7268== Thread 2 libloading:
==7268== Jump to the invalid address stated on the next line
==7268==    at 0x6ED5260: ???
==7268==    by 0x524E367: __nptl_deallocate_tsd.part.5 (in /usr/lib64/libpthread-2.26.so)
==7268==    by 0x524F75A: start_thread (in /usr/lib64/libpthread-2.26.so)
==7268==    by 0x579598E: clone (in /usr/lib64/libc-2.26.so)
==7268==  Address 0x6ed5260 is not stack'd, malloc'd or (recently) free'd
==7268== 
==7268== Can't extend stack to 0x402c138 during signal delivery for thread 2:
==7268==   no stack segment
==7268== 
==7268== Process terminating with default action of signal 11 (SIGSEGV)
==7268==  Access not within mapped region at address 0x402C138
==7268==    at 0x6ED5260: ???
==7268==    by 0x524E367: __nptl_deallocate_tsd.part.5 (in /usr/lib64/libpthread-2.26.so)
==7268==    by 0x524F75A: start_thread (in /usr/lib64/libpthread-2.26.so)
==7268==    by 0x579598E: clone (in /usr/lib64/libc-2.26.so)
==7268==  If you believe this happened as a result of a stack
==7268==  overflow in your program's main thread (unlikely but
==7268==  possible), you can try to increase the size of the
==7268==  main thread stack using the --main-stacksize= flag.
==7268==  The main thread stack size used in this run was 8388608.
==7268== Invalid write of size 8
==7268==    at 0x4A2A64A: _vgnU_freeres (vg_preloaded.c:59)
==7268==  Address 0x402cff8 is on thread 2's stack
==7268== 
==7268== 
==7268== Process terminating with default action of signal 11 (SIGSEGV)
==7268==  Access not within mapped region at address 0x402CFF8
==7268==    at 0x4A2A64A: _vgnU_freeres (vg_preloaded.c:59)
==7268==  If you believe this happened as a result of a stack
==7268==  overflow in your program's main thread (unlikely but
==7268==  possible), you can try to increase the size of the
==7268==  main thread stack using the --main-stacksize= flag.
==7268== 
==7268== HEAP SUMMARY:
==7268==     in use at exit: 352 bytes in 3 blocks
==7268==   total heap usage: 23 allocs, 20 frees, 5,984 bytes allocated
==7268== 
==7268== Thread 1:
==7268== 32 bytes in 1 blocks are still reachable in loss record 1 of 3
==7268==    at 0x4C31A1E: calloc (vg_replace_malloc.c:711)
==7268==    by 0x56BA081: __cxa_thread_atexit_impl (in /usr/lib64/libc-2.26.so)
==7268==    by 0x1611C4: register_dtor (fast_thread_local.rs:39)
==7268==    by 0x1611C4: register_dtor<core::cell::RefCell<core::option::Option<std::sys_common::thread_info::ThreadInfo>>> (local.rs:431)
==7268==    by 0x1611C4: get<core::cell::RefCell<core::option::Option<std::sys_common::thread_info::ThreadInfo>>> (local.rs:422)
==7268==    by 0x1611C4: std::sys_common::thread_info::THREAD_INFO::__getit (local.rs:184)
==7268==    by 0x159BE4: try_with<core::cell::RefCell<core::option::Option<std::sys_common::thread_info::ThreadInfo>>,closure,()> (local.rs:374)
==7268==    by 0x159BE4: <std::thread::local::LocalKey<T>>::with (local.rs:288)
==7268==    by 0x160F9B: set (thread_info.rs:46)
==7268==    by 0x160F9B: std::rt::lang_start_internal (rt.rs:51)
==7268==    by 0x11A4A1: std::rt::lang_start (rt.rs:74)
==7268==    by 0x11AC6D: main (lib.rs:0)
==7268== 
==7268== 32 bytes in 1 blocks are still reachable in loss record 2 of 3
==7268==    at 0x4C31A1E: calloc (vg_replace_malloc.c:711)
==7268==    by 0x4E3D7C4: _dlerror_run (in /usr/lib64/libdl-2.26.so)
==7268==    by 0x4E3D140: dlsym (in /usr/lib64/libdl-2.26.so)
==7268==    by 0x16BFAB: _ZN3std3sys4unix4weak5fetch17h0bb2acfe88a8cfddE.llvm.E9BB0996 (weak.rs:78)
==7268==    by 0x15C3D7: get<unsafe extern "C" fn(*const libc::unix::notbsd::linux::other::b64::x86_64::pthread_attr_t) -> usize> (weak.rs:62)
==7268==    by 0x15C3D7: min_stack_size (thread.rs:378)
==7268==    by 0x15C3D7: std::sys::unix::thread::Thread::new (thread.rs:59)
==7268==    by 0x131FD3: std::thread::Builder::spawn (mod.rs:416)
==7268==    by 0x128056: test::run_test::run_test_inner (lib.rs:1424)
==7268==    by 0x12774E: test::run_test (lib.rs:1448)
==7268==    by 0x122936: run_tests<closure> (lib.rs:1108)
==7268==    by 0x122936: test::run_tests_console (lib.rs:929)
==7268==    by 0x11D9D6: test::test_main (lib.rs:267)
==7268==    by 0x11DCA4: test::test_main_static (lib.rs:303)
==7268==    by 0x11AC39: test_libloading::__test::main (lib.rs:1)
==7268== 
==7268== 288 bytes in 1 blocks are possibly lost in loss record 3 of 3
==7268==    at 0x4C31A1E: calloc (vg_replace_malloc.c:711)
==7268==    by 0x4013D66: _dl_allocate_tls (in /usr/lib64/ld-2.26.so)
==7268==    by 0x5250177: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.26.so)
==7268==    by 0x15C491: std::sys::unix::thread::Thread::new (thread.rs:78)
==7268==    by 0x131FD3: std::thread::Builder::spawn (mod.rs:416)
==7268==    by 0x128056: test::run_test::run_test_inner (lib.rs:1424)
==7268==    by 0x12774E: test::run_test (lib.rs:1448)
==7268==    by 0x122936: run_tests<closure> (lib.rs:1108)
==7268==    by 0x122936: test::run_tests_console (lib.rs:929)
==7268==    by 0x11D9D6: test::test_main (lib.rs:267)
==7268==    by 0x11DCA4: test::test_main_static (lib.rs:303)
==7268==    by 0x11AC39: test_libloading::__test::main (lib.rs:1)
==7268==    by 0x11A4C1: std::rt::lang_start::{{closure}} (rt.rs:74)
==7268== 
==7268== LEAK SUMMARY:
==7268==    definitely lost: 0 bytes in 0 blocks
==7268==    indirectly lost: 0 bytes in 0 blocks
==7268==      possibly lost: 288 bytes in 1 blocks
==7268==    still reachable: 64 bytes in 2 blocks
==7268==         suppressed: 0 bytes in 0 blocks
==7268== 
==7268== For counts of detected and suppressed errors, rerun with: -v
==7268== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

@nagisa
Copy link
Owner

nagisa commented Mar 28, 2018

Okay, so it seems to be the same (or a very similar) problem as #5, except this time in Linux.

It seems to me that in your case you have a scenario where the libdltest.so in question has some thread local data, which gets allocated when you run the test_fn (is it using println!? That uses thread-local data), but the test thread terminates after the library is unloaded -- and destructor code is gone.

Not sure why this does not happen with executables, but it would be nice if you tried the workaround referenced in this comment.

@binkoni
Copy link
Author

binkoni commented Mar 28, 2018

The workaround you mentioned worked and segfault doesn't seem to occur anymore. But I don't think test_fn uses thread local storage, because the function is very simple.
The source code of libdltest.so

@jackcmay
Copy link

I'm seeing the same issue in both tests and executables when loading/unloading modules in threads on linux (~16.04.1-Ubuntu SMP Thu Jul 19 09:46:30 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux) and does not reproduce on recent OS-X).

rust-lang/rust#28794 captures the issue well but I think on linux the problems still exists.

The workarounds mentioned so far won't work when the module must be loaded/unloaded many times from a thread. One alternative workaround is to load the module, spawn the thread, then unload the module after the thread has been cleaned up. This workaround is also not always viable and really it should be fixed on linux.

Anyone familiar with movement on this issue from the linux side?

@nagisa
Copy link
Owner

nagisa commented Sep 22, 2018

Can either of you produce a minimal reproducer?

@jackcmay
Copy link

@nagisa
Copy link
Owner

nagisa commented Sep 24, 2018

@jackcmay This is literally the TLS issue, similar one to the issue which OS X fixed by explicitly not unloading the dynamic libraries if they have any thread locals, implicitly applying RTLD_NODELETE to them. This is the first time I see it being reproduced on Linux though.

I’m fairly confident that you can construct a similarly problematic sample when the loader is C/C++ code (i.e. taking libloading out of the equation), as well as without a dynamic library written in Rust (by, for example, using the C++’s thread variables, taking Rust’s libstd out of equation as well). I think you’re best off reporting this against glibc itself.

Just like the OS X counterpart #5, I’m inclined to call this issue not-a-bug in libloading. It is most likely not a bug in rust’s libstd as well.

In your case a viable workaround, if RTLD_NODELETE is not appropriate, could be ensuring that you do not use anything that creates thread-local variables (such as stdio from libstd).

@KizzyCode
Copy link

Ok, the workaround with RTLD_NODELETE works (the values are from the libc-crate):

// Load and initialize library
#[cfg(target_os = "linux")]
let library: Library = {
    // Load library with `RTLD_NOW | RTLD_NODELETE` to fix a SIGSEGV
    ::libloading::os::unix::Library::open(Some(path.as_ref()), 0x2 | 0x1000)?.into()
};
#[cfg(not(target_os = "linux"))]
let library = Library::new(path.as_ref())?;

@Aaron1011
Copy link

The glibc documentation mentions this kind of issue, and how the implementation (supposedly) avoids it: https://sourceware.org/glibc/wiki/Destructor%20support%20for%20thread_local%20variables

I suspect that this may be a problem with the particular shared library being unloaded. In particular, the glibc implementation seems to rely on the function __cxa_thread_atexit_impl being called by the programming language runtime (Rust calls it here)

@Aaron1011
Copy link

I've created a standalone C reproducer, using pthread_key_create: https://github.com/Aaron1011/pthread_dlopen

It does the following:

  1. Spawn a new thread from main(), and block on it using pthread_join
  2. From the new thread, load a simple shared library, and call a function in it.
  3. In the shared library, call pthread_key_create with a destrutor function, and call pthread_setspecific with a non-NULL value to force the destrutor to actually run on thread exit.
  4. Back in the main program (on the thread), call dlclose on the shared library.
  5. Return from the thread function

This causes the following segfault:

[Current thread is 1 (Thread 0x7facc3ff4640 (LWP 701108))]
gef➤  bt
#0  0x00007facc4229129 in ?? ()
#1  0x00007facc41cd411 in __nptl_deallocate_tsd.part.0 () from /usr/lib/libpthread.so.0
#2  0x00007facc41ce2ba in start_thread () from /usr/lib/libpthread.so.0
#3  0x00007facc40f7053 in clone () from /usr/lib/libc.so.6
gef➤  q

It appears that calling dlclose on a shared library does not unregister any pthreads destructors (and presumably, any destructors registered using the same underlying mechanism that pthreads uses).

@Aaron1011
Copy link

There's already a glibc bug report for this: https://sourceware.org/bugzilla/show_bug.cgi?id=21032

Unfortunately, it's been open since 2017 without any feedback.

@Aaron1011
Copy link

While this not a bug in libloading, I think it's a major footgun for anyone using this library. As https://github.com/Aaron1011/pthread_dlopen demonstrates, it's not possible to determine ahead of time if a shared library has registered TLS destructors (if it's even possible at all), since they may be registered long after the library is loaded. If the usage of libloading is encapsulated by a safe API, this can easily lead to confusing segfaults without any obvious cause.

In light of this, I think it might be a good idea of libloading to change its default behavior to not unload shared libraries on Linux when a Library is dropped. While this is obviously a hack, the only alternative is for every single consumer of the crate to add in its own special handling for Linux. Even if glibc released a fix tomorrow, older versions of glibc will still be in use for years. Since glibc is almost always dynamically linked, individual crate authors are generally stuck with whatever glibc is in use on the machine the crate ends up running on.

Concretely, I think that Library's Drop implementation should not unload libraries on Linux, unless the user explicitly requests it (e.g. via a new method library.unload_on_drop()). While this would have side effects (increased memory usage, and skipping of __attribute__((destructor)) functions), I think preventing crashes by default is more important.

@nagisa What are your thoughts?

@nagisa
Copy link
Owner

nagisa commented Feb 21, 2021

I… don't know. Not unloading the libraries will break a fairly common use-case of hot reloading, as glibc's loader will not attempt to (re-)load a library that shares the same name with an already loaded DSO, even if the file contents have changed.

With a recent update of libloading the Library::new and the OS-specific counterparts already became unsafe for fairly similar reasons, so I'm kind of inclined to just consider library loading (and unloading) extremely unsafe and try document all of the cases people need to consider when they load libraries. And leave it up to them to figure out what makes most sense for their situations.

@Aaron1011
Copy link

Aaron1011 commented Feb 21, 2021

Not unloading the libraries will break a fairly common use-case of hot reloading

I may be wrong, but I would think that hot reloading would be less common than other uses cases (e.g. opening a typical shared library and calling a function). If that's true, then I think it would make more sense for the (less-typical) hot-reloading code to opt into automatically unloading, instead of (more-typical) code needing to opt out of automatically unloading.

Looking at the issues that are linked here, it looks like severe; different projects have run into this. I think the fact that the issue occurs in the Drop impl makes this particularly subtle - while unsafe code authors should be aware of the need to audit surrounding safe code, I think it's particularly easy to forget to consider what happens when a (fully initialized) struct is dropped, sometimes behind multiple layers of indirection.

@nagisa
Copy link
Owner

nagisa commented Feb 21, 2021

I know that macOS does employ this technique – they don't unload the machine code for libraries that contain thread-local variables, but they are both in a position to selectively apply this, and they are able to make sure this approach of theirs does not end up having unfortunate side effects (unsuccessfully).

We're not in that kind of position I fear. If we end up doing this, I would be inclined to adjust the library so that we don't implicitly implement Drop at all – we do have close after all. This would help somewhat with issues like #46, which is effectively the same issue in a different context, too.

… says @nagisa while looking at "Safer bindings around system dynamic library loading primitives" and wondering if they should remove the first word out of that sentence...

bors bot added a commit to gfx-rs/wgpu that referenced this issue Feb 26, 2021
1236: Update gfx to 0a201d1c406b5119ec11068293a40e50ec0be4c8 r=kvark a=Aaron1011

**Connections**
wgpu issue: #246
GFX PR: gfx-rs/gfx#3653
Underlying libloading issue: nagisa/rust_libloading#41

**Description**

Pulls in gfx-rs/gfx#3653,
which fixes a segfault when using wgpu from a non-main thread.

**Testing**

The example in #246 should run successfully. I'm not certain how to add an integration test to the repository.

Co-authored-by: Aaron Hill <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants