TLS destructors are not run on Library::drop resulting in illegal instruction on OS X #5

benanders · 2015-12-14T17:09:25Z

Hi, thanks for making this library, it's really useful to me.

Unfortunately, when trying out a really simple use case, I get an Illegal Hardware Instruction error. The Rust code I'm using to load the dylib is:

extern crate libloading;

use libloading::{Library, Symbol};

fn main() {
    let lib = Library::new("../test/target/release/libtest.dylib").unwrap();
    let sym: Symbol<extern fn() -> ()> = unsafe {lib.get(b"testing")}.unwrap();
    sym();
}

The dylib I'm loading contains a single function:

#[no_mangle]
pub fn testing() {
    println!("YES!");
}

The dylib's Cargo.toml file contains the needed crate-type = ["dylib"] qualifier.

I wrote some equivalent C code (loading the exact same Rust library from above), which works perfectly fine (no errors):

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char *argv[]) {
    void *lib = dlopen("../test/target/release/libtest.dylib", RTLD_LAZY);
    void (*sym)(void) = dlsym(lib, "testing");
    sym();
}

Any ideas why this might be happening? The illegal hardware instruction occurs after the main Rust function exits (I can place a print at the end of the main function and it'll be run, then the error will occur). I'm on OSX 10.11.2, using the most recent stable rust (rustc 1.5.0 (3d7cd77e4 2015-12-04) ).

I narrowed it down to the Drop function on the Library struct. If I comment its contents out, then the error doesn't happen. I also replaced the Drop function with just a single call to dlclose like so:

    fn drop(&mut self) {
        println!("{}", unsafe { dlclose(self.handle) });
    }

Which prints 0 (meaning the close function didn't return an error), which is weird.

The text was updated successfully, but these errors were encountered:

benanders · 2015-12-14T17:21:55Z

I also just tried this on the latest nightly rustc 1.7.0-nightly (110df043b 2015-12-13) and I have the same problem in both release and debug mode (with and without the --release flag for cargo).

nagisa · 2015-12-14T17:23:04Z

Interesting. I cannot reproduce the error on Linux and do not possess an OS X machine to test this on, so I can’t really help debugging this other than with general tips for debugging this kind of problems.

Illegal instruction in rust usually comes from the ud2 instruction which is emitted in certain cases by the compiler and where intrinsics::unreachable was used. Illegal instruction also may be caused by lack of panic handling setup – unwinding through FFI boundary is illegal in rust (but since both caller and callee are both Rust, I can’t imagine this being a problem).

If you’re interested in tracking down and fixing the issue, please do so! Otherwise I’ll just keep the issue open for a while so other people could find this if they hit it as well (on stable or not).

nagisa · 2015-12-14T17:30:23Z

Interesting places to start tracking down the issue would be a stack trace at the time of hardware fault and disassembly around the invalid insn, I guess.

benanders · 2015-12-14T17:30:37Z

Honestly I'd have no idea where to start in debugging something like this, I'm relatively inexperienced with particularly low level stuff, but I'd like to try getting the issue resolved. Can you make any sense of this backtrace from GDB?

Program received signal SIGSEGV, Segmentation fault.
0x0000000101416630 in ?? ()
(gdb) bt
#0  0x0000000101416630 in ?? ()
#1  0x00007fff82e18155 in tlv_finalize () from /usr/lib/system/libdyld.dylib
#2  0x00007fff818fe768 in exit () from /usr/lib/system/libsystem_c.dylib
#3  0x00007fff82e185b4 in start () from /usr/lib/system/libdyld.dylib
#4  0x00007fff82e185ad in start () from /usr/lib/system/libdyld.dylib
#5  0x0000000000000000 in ?? ()

The disassembly from the function above the ?? in the stack trace (not sure if this is useful):

(gdb) up
#1  0x00007fff82e18155 in tlv_finalize () from /usr/lib/system/libdyld.dylib
(gdb) disas
Dump of assembler code for function tlv_finalize:
   0x00007fff82e18124 <+0>: push   %rbp
   0x00007fff82e18125 <+1>: mov    %rsp,%rbp
   0x00007fff82e18128 <+4>: push   %r15
   0x00007fff82e1812a <+6>: push   %r14
   0x00007fff82e1812c <+8>: push   %rbx
   0x00007fff82e1812d <+9>: push   %rax
   0x00007fff82e1812e <+10>:    mov    %rdi,%r14
   0x00007fff82e18131 <+13>:    mov    0x4(%r14),%r15d
   0x00007fff82e18135 <+17>:    test   %r15d,%r15d
   0x00007fff82e18138 <+20>:    je     0x7fff82e1815e <tlv_finalize+58>
   0x00007fff82e1813a <+22>:    lea    -0x1(%r15),%eax
   0x00007fff82e1813e <+26>:    shl    $0x4,%rax
   0x00007fff82e18142 <+30>:    lea    0x10(%rax,%r14,1),%rbx
   0x00007fff82e18147 <+35>:    mov    -0x8(%rbx),%rax
   0x00007fff82e1814b <+39>:    test   %rax,%rax
   0x00007fff82e1814e <+42>:    je     0x7fff82e18155 <tlv_finalize+49>
   0x00007fff82e18150 <+44>:    mov    (%rbx),%rdi
   0x00007fff82e18153 <+47>:    callq  *%rax
=> 0x00007fff82e18155 <+49>:    add    $0xfffffffffffffff0,%rbx
   0x00007fff82e18159 <+53>:    dec    %r15d
   0x00007fff82e1815c <+56>:    jne    0x7fff82e18147 <tlv_finalize+35>
   0x00007fff82e1815e <+58>:    mov    %r14,%rdi
   0x00007fff82e18161 <+61>:    add    $0x8,%rsp
   0x00007fff82e18165 <+65>:    pop    %rbx
   0x00007fff82e18166 <+66>:    pop    %r14
   0x00007fff82e18168 <+68>:    pop    %r15
   0x00007fff82e1816a <+70>:    pop    %rbp
   0x00007fff82e1816b <+71>:    jmpq   0x7fff82e185bc

I take it since there's no ud2 instruction that that's not the problem? GDB won't let me get the disassembly for the function that's actually triggering the fault.

nagisa · 2015-12-14T17:41:57Z

Hmm, at a first sight it probably has nothing to do with the implementation of this library. Rather, Rust (and all other languages’) programs have some thread local storage set up. For Rust, things like printing have some TLS set-up, and it might be a case of TLS getting corrupted for the whole program (e.g. a case similar to double-free, where rust Runtime gets unloaded twice?). I’m not sure.

If you don’t mind leaking the loaded library (i.e. library you load is used more than once, perhaps, for the duration of the whole program), I can suggest you forgetting the library so it doesn’t execute these cleanups. That should at least avoid the issue.

benanders · 2015-12-14T17:44:01Z

Yeah that seems like the best option so far. I'm not rapidly opening and closing libraries where resource management is important, so leaking is the easiest way out. I hadn't seen mem::forget, thanks for that!

nagisa · 2015-12-14T17:55:31Z

According to @alexcrichton, it is very likely to be a case of the library registering some TLS destructors with pthreads, but they’re executed only when the thread itself finishes, rather than when the library is unloaded, thus resulting in us executing code that does not exist anymore. Apparently, there have been cases in a past where this has been encountered as well.

In this case, I’d say this is a bug in OS X itself (or its libdyld/pthreads) with suggested fix to “forget” the loaded library. Note, that not using any TLS related features (this includes anything related to stdio in Rust) would also avoid this bug.

calebmer · 2016-02-09T23:37:19Z

What's the status on this? We would look to use this library and OS X support is required. This bug is a major blocker. A couple specific questions:

If this is a bug in someone else's code, have the appropriate issues been filed? If so, are there any links to those issues?
If there are workarounds (as you mention) are there specific examples of code that doesn't work vs code that does?
Is there any progress on code being added to the library to workaround this bug?
Are there any libraries besides this one that serve the same function and don't have troubles with OS X?

emoon · 2016-02-22T11:54:07Z

Running into this issue also so wondering the same thing if this is being tracked else where?

benanders · 2016-02-22T13:08:02Z

As far as I know, no other issues have been filed. Last time I check, there are no other libraries for Rust which serve the same purpose as this one. As for a workaround, I don't believe one is being worked on, and I unfortunately don't have the time, knowledge, or experience to try and fix this myself. I think we're out of luck at the moment :(

As far as who should be responsible for the bug, I'm not entirely sure. It might be a bug in Rust itself, because it doesn't seem to be specific to this library. But I'm not sure how willing the Rust maintainers would be to attempt to go about fixing it, since it involves the use of unsafe code and a native C library.

nagisa · 2016-02-22T13:20:12Z

I’m not aware of any issues reported in other projects, nor I am aware of a public OS X issue tracker of any sort where such an issue could be reported/searched for. That being said, I do admit I didn’t look very hard for either one.

nagisa · 2016-02-22T13:44:45Z

@calebmer sorry for the late response! Your comment completely fell through the cracks! Your’s are all very good questions thus I’ll try to answer them extensively:

If this is a bug in someone else's code, have the appropriate issues been filed? If so, are there any links to those issues?

No. No upstream bugs have been filled, primarily because I’m not very familiar with the OS X community or the issue reporting process. Last time I checked it needed one to pay 100 USD upfront even to report an issue in Apple’s own OS.

If there are workarounds (as you mention) are there specific examples of code that doesn't work vs code that does?

Two workarounds are:

Never closing the library which exposes the issue (e.g. mem::forget(library) after the necessary symbols are retrieved), as mentioned previously;
Ensuring the loaded library does not invoke thread-local functionality, but that might not be always feasible. Writing external libraries in languages which do not rely on TLS as extensively as Rust might help. Not using the Rust standard library (#[no_std]) would also make this easier.

Is there any progress on code being added to the library to workaround this bug?

I’m not sure it is possible to resolve this issue from in this library properly. An option would be to leak all the opened libraries by default on OS X, but I wouldn’t consider that a viable option.

Are there any libraries besides this one that serve the same function and don't have troubles with OS X?

You could certainly use barebones dlopen and dlsym and dlclose, but you would almost certainly hit the same issue as with this library. Avoiding dlopen would involve writing a whole dynamic linker for the platform of your choice by yourself.

@GravityScore you said

since it involves the use of unsafe code and a native C library

What do you mean? Rust’s standard library on OS X is strongly tied to the standard libc and contains a big amount of unsafe code. If using some additional unsafe code in the standard library would avoid the issue, I think the fix would be gladly accepted; though, I don’t think it would solve the issue in general: one could still produce a library which could use TLS in a way which would expose this issue regardless of what’s done in the Rust compiler or the standard library.

emoon · 2016-02-22T14:07:04Z

I have a question here (this is somewhat generic to Rust but bare with me) So I keep track of Library with in a struct here https://github.com/emoon/dynamic_reload/blob/master/src/lib.rs#L44 that is the later stuffed into a Vec<Rc<Lib>> So I wonder how I should do the forget in this case? Should I implement Drop for the struct that holds this data and then do mem::forget on lib

nagisa · 2016-02-22T14:32:04Z

@emoon I guess the least intrusive way stable way currently would be to do something like this and then wrap your Library into the Leak.

struct Leak<T>(Option<T>);

impl<T> Drop for Leak<T> {
    fn drop(&mut self) {
        ::std::mem::forget(self.0.take());
    }
}

emoon · 2016-02-22T14:35:11Z

@nagisa Alright. Thanks!

calebmer · 2016-02-22T20:33:24Z

@nagisa totally understand the delay, thanks for the great response! 😊

MasonRemaley · 2016-02-23T02:24:30Z

I believe this is caused by #28794, if I understand correctly it's an issue with the way the Rust compiler generates dylibs.

(I think you'd get the same crash in C if you called dlclose, but that you wouldn't get the crash from either language if the library being loaded wasn't written in Rust.)

nagisa · 2016-02-23T11:00:33Z

if I understand correctly it's an issue with the way the Rust compiler generates dylibs.

There’s nothing specific with the dylib generation, but rather with how Rust standard library implements the TLS on OS X.

but that you wouldn't get the crash from either language if the library being loaded wasn't written in Rust.)

You could use/implement TLS destructors using that function in any other language and hit exactly the same issues too.

Either way, thanks for finding and cross-referencing the issue.

MasonRemaley · 2016-02-24T02:55:19Z

No problem, thanks for the explanation!

nagisa · 2016-09-01T20:02:55Z

Since 0.3 you can specify arbitrary flags when opening a library. The RTLD_NODELETE (thanks for reminder @Np2x) essentially acts as implicit mem::forget so you can now do something along the lines of:

let os_lib = libloading::os::unix::Library::open("fname", RTLD_NODELETE | RTLD_NOW)?;
let lib = libloading::Library::from(os_lib);
/* do your stuff */

This should still work while liberating you from having to mem::forget your libraries :)

nagisa · 2018-03-28T07:14:33Z

As per this comment, Apple has fixed this issue by implementing dynamic library unloading, if said dynamic libraries use TLS, as a no-op.

nagisa added the not-our-bug label Dec 14, 2015

nagisa changed the title ~~Error on exit~~ TLS destructors are not run on Library::drop resulting in illegal instruction on OS X Dec 14, 2015

emoon mentioned this issue Feb 22, 2016

ProDBG crashes on exit on Mac emoon/ProDBG#102

Closed

alexcrichton mentioned this issue Dec 14, 2016

Dynamic library, OSX, Segmentation fault rust-lang/rust#38370

Closed

martinrlilja mentioned this issue Jan 16, 2017

Basic implementation of hot-loading/mods citybound/citybound#95

Closed

Mark-Simulacrum mentioned this issue May 18, 2017

Unloading a Rust dylib with TLS used segfaults on OSX rust-lang/rust#28794

Closed

nagisa mentioned this issue Sep 4, 2017

dylib function that returns custom type when retrieved produces Segmentation Fault on macOS #30

Closed

nagisa mentioned this issue Jan 12, 2018

Allow runtime switching between trans backends rust-lang/rust#45684

Merged

ubolonton added a commit to ubolonton/rust-dylib-issues that referenced this issue Feb 26, 2018

https://github.com/nagisa/rust_libloading/issues/5

55e7bf6

nagisa mentioned this issue Mar 28, 2018

SEGFAULT sometimes occurs when testing libloading code #41

Open

jackcmay mentioned this issue Sep 23, 2018

Thread safety issue when loading dynamic modules on Linux solana-labs/solana#1314

Closed

kurtlawrence mentioned this issue Mar 14, 2019

Linux: out# results hidden after call to println!() kurtlawrence/papyrus#16

Closed

nagisa mentioned this issue Oct 23, 2019

Dropping and reloading a dylib on macOS returns the old version, but only if the dylib uses the stdlib #59

Closed

kurtlawrence mentioned this issue Dec 18, 2019

Keep loaded libraries in memory kurtlawrence/papyrus#44

Closed

3 tasks

mulimoen mentioned this issue Jul 11, 2021

Don't unload hdf5 in build script aldanor/hdf5-rust#163

Merged

aseyboldt mentioned this issue Apr 27, 2023

Deadlock on Windows when unloading libraries in several threads roualdes/bridgestan#111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLS destructors are not run on Library::drop resulting in illegal instruction on OS X #5

TLS destructors are not run on Library::drop resulting in illegal instruction on OS X #5

benanders commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

nagisa commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

calebmer commented Feb 9, 2016

emoon commented Feb 22, 2016

benanders commented Feb 22, 2016

nagisa commented Feb 22, 2016

nagisa commented Feb 22, 2016

emoon commented Feb 22, 2016

nagisa commented Feb 22, 2016

emoon commented Feb 22, 2016

calebmer commented Feb 22, 2016

MasonRemaley commented Feb 23, 2016

nagisa commented Feb 23, 2016

MasonRemaley commented Feb 24, 2016

nagisa commented Sep 1, 2016 •

edited

Loading

nagisa commented Mar 28, 2018

TLS destructors are not run on Library::drop resulting in illegal instruction on OS X #5

TLS destructors are not run on Library::drop resulting in illegal instruction on OS X #5

Comments

benanders commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

nagisa commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

benanders commented Dec 14, 2015

nagisa commented Dec 14, 2015

calebmer commented Feb 9, 2016

emoon commented Feb 22, 2016

benanders commented Feb 22, 2016

nagisa commented Feb 22, 2016

nagisa commented Feb 22, 2016

emoon commented Feb 22, 2016

nagisa commented Feb 22, 2016

emoon commented Feb 22, 2016

calebmer commented Feb 22, 2016

MasonRemaley commented Feb 23, 2016

nagisa commented Feb 23, 2016

MasonRemaley commented Feb 24, 2016

nagisa commented Sep 1, 2016 • edited Loading

nagisa commented Mar 28, 2018

nagisa commented Sep 1, 2016 •

edited

Loading