Skip to content

Commit

Permalink
Rewrite msvc backtrace support to be much faster on 64-bit platforms
Browse files Browse the repository at this point in the history
Currently, capturing the stack backtrace is done on Windows by calling
into `dbghelp!StackWalkEx` (or `dbghelp!StackWalk2` if the version of
`dbghelp` we loaded is too old to contain that function). This is very
convenient since `StackWalkEx` handles everything for us but there are
two issues with doing so:

1. `dbghelp` is not safe to use from multiple threads at the same time
   so all calls into it must be serialized.
2. `StackWalkEx` returns inlined frames as if they were regular stack
   frames which requires loading debug info just to walk the stack. As a
   result, simply capturing a backtrace without resolving it is much
   more expensive on Windows than *nix.

This change rewrites our Windows support to call `RtlVirtualUnwind`
instead on platforms which support this API (`x86_64` and `aarch64`).
This API walks the actual (ie, not inlined) stack frames so it does not
require loading any debug info and is significantly faster. For
platforms that do not support `RtlVirtualUnwind` (ie, `i686`), we fall
back to the current implementation which calls into `dbghelp`.

To recover the inlined frame information when we are asked to resolve
symbols, we use `SymAddrIncludeInlineTrace` to load debug info and
detect inlined frames and then `SymQueryInlineTrace` to get the
appropriate inline context to resolve them.

The result is significant performance improvements to backtrace capture
and symbolizing on Windows!

Before:

```
> cargo +nightly bench
     Running benches\benchmarks.rs

running 6 tests
test new                                 ... bench:     658,652 ns/iter (+/- 30,741)
test new_unresolved                      ... bench:     343,240 ns/iter (+/- 13,108)
test new_unresolved_and_resolve_separate ... bench:     648,890 ns/iter (+/- 31,651)
test trace                               ... bench:     304,815 ns/iter (+/- 19,633)
test trace_and_resolve_callback          ... bench:     463,645 ns/iter (+/- 12,893)
test trace_and_resolve_separate          ... bench:     474,290 ns/iter (+/- 73,858)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 8.26s
```

After:

```
> cargo +nightly bench
     Running benches\benchmarks.rs

running 6 tests
test new                                 ... bench:     495,468 ns/iter (+/- 31,215)
test new_unresolved                      ... bench:       1,241 ns/iter (+/- 251)
test new_unresolved_and_resolve_separate ... bench:     436,730 ns/iter (+/- 32,482)
test trace                               ... bench:         850 ns/iter (+/- 162)
test trace_and_resolve_callback          ... bench:     410,790 ns/iter (+/- 19,424)
test trace_and_resolve_separate          ... bench:     408,090 ns/iter (+/- 29,324)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 7.02s
```

The changes to the symbolize step also allow us to report inlined frames
when resolving from just an instruction address which was not previously
possible.
  • Loading branch information
wesleywiser committed Oct 10, 2023
1 parent 99faef8 commit beb5683
Show file tree
Hide file tree
Showing 6 changed files with 273 additions and 208 deletions.
286 changes: 134 additions & 152 deletions src/backtrace/dbghelp.rs
Original file line number Diff line number Diff line change
@@ -1,30 +1,32 @@
//! Backtrace strategy for MSVC platforms.
//!
//! This module contains the ability to generate a backtrace on MSVC using one
//! of two possible methods. The `StackWalkEx` function is primarily used if
//! possible, but not all systems have that. Failing that the `StackWalk64`
//! function is used instead. Note that `StackWalkEx` is favored because it
//! handles debuginfo internally and returns inline frame information.
//! This module contains the ability to capture a backtrace on MSVC using one
//! of three possible methods. For `x86_64` and `aarch64`, we use `RtlVirtualUnwind`
//! to walk the stack one frame at a time. This function is much faster than using
//! `dbghelp!StackWalk*` because it does not load debug info to report inlined frames.
//! We still report inlined frames during symbolization by consulting the appropriate
//! `dbghelp` functions.
//!
//! For all other platforms, primarily `i686`, the `StackWalkEx` function is used if
//! possible, but not all systems have that. Failing that the `StackWalk64` function
//! is used instead. Note that `StackWalkEx` is favored because it handles debuginfo
//! internally and returns inline frame information.
//!
//! Note that all dbghelp support is loaded dynamically, see `src/dbghelp.rs`
//! for more information about that.
#![allow(bad_style)]

use super::super::{dbghelp, windows::*};
use super::super::windows::*;
use core::ffi::c_void;
use core::mem;

#[derive(Clone, Copy)]
pub enum StackFrame {
New(STACKFRAME_EX),
Old(STACKFRAME64),
}

#[derive(Clone, Copy)]
pub struct Frame {
pub(crate) stack_frame: StackFrame,
base_address: *mut c_void,
ip: *mut c_void,
sp: *mut c_void,
#[cfg(not(target_env = "gnu"))]
inline_context: Option<DWORD>,
}

// we're just sending around raw pointers and reading them, never interpreting
Expand All @@ -34,62 +36,108 @@ unsafe impl Sync for Frame {}

impl Frame {
pub fn ip(&self) -> *mut c_void {
self.addr_pc().Offset as *mut _
self.ip
}

pub fn sp(&self) -> *mut c_void {
self.addr_stack().Offset as *mut _
self.sp
}

pub fn symbol_address(&self) -> *mut c_void {
self.ip()
self.ip
}

pub fn module_base_address(&self) -> Option<*mut c_void> {
Some(self.base_address)
}

fn addr_pc(&self) -> &ADDRESS64 {
match self.stack_frame {
StackFrame::New(ref new) => &new.AddrPC,
StackFrame::Old(ref old) => &old.AddrPC,
}
#[cfg(not(target_env = "gnu"))]
pub fn inline_context(&self) -> Option<DWORD> {
self.inline_context
}
}

fn addr_pc_mut(&mut self) -> &mut ADDRESS64 {
match self.stack_frame {
StackFrame::New(ref mut new) => &mut new.AddrPC,
StackFrame::Old(ref mut old) => &mut old.AddrPC,
}
#[repr(C, align(16))] // required by `CONTEXT`, is a FIXME in winapi right now
struct MyContext(CONTEXT);

#[cfg(target_arch = "x86_64")]
impl MyContext {
#[inline(always)]
fn ip(&self) -> DWORD64 {
self.0.Rip
}

fn addr_frame_mut(&mut self) -> &mut ADDRESS64 {
match self.stack_frame {
StackFrame::New(ref mut new) => &mut new.AddrFrame,
StackFrame::Old(ref mut old) => &mut old.AddrFrame,
}
#[inline(always)]
fn sp(&self) -> DWORD64 {
self.0.Rsp
}
}

fn addr_stack(&self) -> &ADDRESS64 {
match self.stack_frame {
StackFrame::New(ref new) => &new.AddrStack,
StackFrame::Old(ref old) => &old.AddrStack,
}
#[cfg(target_arch = "aarch64")]
impl MyContext {
#[inline(always)]
fn ip(&self) -> DWORD64 {
self.0.Pc
}

fn addr_stack_mut(&mut self) -> &mut ADDRESS64 {
match self.stack_frame {
StackFrame::New(ref mut new) => &mut new.AddrStack,
StackFrame::Old(ref mut old) => &mut old.AddrStack,
}
#[inline(always)]
fn sp(&self) -> DWORD64 {
self.0.Sp
}
}

#[repr(C, align(16))] // required by `CONTEXT`, is a FIXME in winapi right now
struct MyContext(CONTEXT);
#[cfg(any(target_arch = "x86_64", target_arch = "aarch64"))]
#[inline(always)]
pub unsafe fn trace(cb: &mut dyn FnMut(&super::Frame) -> bool) {
use core::ptr;

let mut context = core::mem::zeroed::<MyContext>();
RtlCaptureContext(&mut context.0);

// Call `RtlVirtualUnwind` to find the previous stack frame, walking until we hit ip = 0.
while context.ip() != 0 {
let mut base = 0;

let fn_entry = RtlLookupFunctionEntry(context.ip(), &mut base, ptr::null_mut());
if fn_entry.is_null() {
break;
}

let frame = super::Frame {
inner: Frame {
base_address: fn_entry as _,
ip: context.ip() as *mut _,
sp: context.sp() as *mut _,
#[cfg(not(target_env = "gnu"))]
inline_context: None,
},
};

if !cb(&frame) {
break;
}

let mut handler_data = 0usize;
let mut establisher_frame = 0;

RtlVirtualUnwind(
0,
base,
context.ip(),
fn_entry,
&mut context.0,
&mut handler_data as *mut _ as *mut _,
&mut establisher_frame,
ptr::null_mut(),
);
}
}

#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
#[inline(always)]
pub unsafe fn trace(cb: &mut dyn FnMut(&super::Frame) -> bool) {
use core::mem;

// Allocate necessary structures for doing the stack walk
let process = GetCurrentProcess();
let thread = GetCurrentThread();
Expand All @@ -98,64 +146,34 @@ pub unsafe fn trace(cb: &mut dyn FnMut(&super::Frame) -> bool) {
RtlCaptureContext(&mut context.0);

// Ensure this process's symbols are initialized
let dbghelp = match dbghelp::init() {
let dbghelp = match super::super::dbghelp::init() {
Ok(dbghelp) => dbghelp,
Err(()) => return, // oh well...
};

// On x86_64 and ARM64 we opt to not use the default `Sym*` functions from
// dbghelp for getting the function table and module base. Instead we use
// the `RtlLookupFunctionEntry` function in kernel32 which will account for
// JIT compiler frames as well. These should be equivalent, but using
// `Rtl*` allows us to backtrace through JIT frames.
//
// Note that `RtlLookupFunctionEntry` only works for in-process backtraces,
// but that's all we support anyway, so it all lines up well.
cfg_if::cfg_if! {
if #[cfg(target_pointer_width = "64")] {
use core::ptr;

unsafe extern "system" fn function_table_access(_process: HANDLE, addr: DWORD64) -> PVOID {
let mut base = 0;
RtlLookupFunctionEntry(addr, &mut base, ptr::null_mut()).cast()
}

unsafe extern "system" fn get_module_base(_process: HANDLE, addr: DWORD64) -> DWORD64 {
let mut base = 0;
RtlLookupFunctionEntry(addr, &mut base, ptr::null_mut());
base
}
} else {
let function_table_access = dbghelp.SymFunctionTableAccess64();
let get_module_base = dbghelp.SymGetModuleBase64();
}
}
let function_table_access = dbghelp.SymFunctionTableAccess64();
let get_module_base = dbghelp.SymGetModuleBase64();

let process_handle = GetCurrentProcess();

// Attempt to use `StackWalkEx` if we can, but fall back to `StackWalk64`
// since it's in theory supported on more systems.
match (*dbghelp.dbghelp()).StackWalkEx() {
Some(StackWalkEx) => {
let mut inner: STACKFRAME_EX = mem::zeroed();
inner.StackFrameSize = mem::size_of::<STACKFRAME_EX>() as DWORD;
let mut frame = super::Frame {
inner: Frame {
stack_frame: StackFrame::New(inner),
base_address: 0 as _,
},
};
let image = init_frame(&mut frame.inner, &context.0);
let frame_ptr = match &mut frame.inner.stack_frame {
StackFrame::New(ptr) => ptr as *mut STACKFRAME_EX,
_ => unreachable!(),
};
let mut stack_frame_ex: STACKFRAME_EX = mem::zeroed();
stack_frame_ex.StackFrameSize = mem::size_of::<STACKFRAME_EX>() as DWORD;
stack_frame_ex.AddrPC.Offset = context.0.Eip as u64;
stack_frame_ex.AddrPC.Mode = AddrModeFlat;
stack_frame_ex.AddrStack.Offset = context.0.Esp as u64;
stack_frame_ex.AddrStack.Mode = AddrModeFlat;
stack_frame_ex.AddrFrame.Offset = context.0.Ebp as u64;
stack_frame_ex.AddrFrame.Mode = AddrModeFlat;

while StackWalkEx(
image as DWORD,
IMAGE_FILE_MACHINE_I386 as DWORD,
process,
thread,
frame_ptr,
&mut stack_frame_ex,
&mut context.0 as *mut CONTEXT as *mut _,
None,
Some(function_table_access),
Expand All @@ -164,39 +182,53 @@ pub unsafe fn trace(cb: &mut dyn FnMut(&super::Frame) -> bool) {
0,
) == TRUE
{
frame.inner.base_address = get_module_base(process_handle, frame.ip() as _) as _;
let frame = super::Frame {
inner: Frame {
base_address: get_module_base(process_handle, stack_frame_ex.AddrPC.Offset)
as _,
ip: stack_frame_ex.AddrPC.Offset as *mut _,
sp: stack_frame_ex.AddrStack.Offset as *mut _,
#[cfg(not(target_env = "gnu"))]
inline_context: Some(stack_frame_ex.InlineFrameContext),
},
};

if !cb(&frame) {
break;
}
}
}
None => {
let mut frame = super::Frame {
inner: Frame {
stack_frame: StackFrame::Old(mem::zeroed()),
base_address: 0 as _,
},
};
let image = init_frame(&mut frame.inner, &context.0);
let frame_ptr = match &mut frame.inner.stack_frame {
StackFrame::Old(ptr) => ptr as *mut STACKFRAME64,
_ => unreachable!(),
};
let mut stack_frame64: STACKFRAME64 = mem::zeroed();
stack_frame64.AddrPC.Offset = context.0.Eip as u64;
stack_frame64.AddrPC.Mode = AddrModeFlat;
stack_frame64.AddrStack.Offset = context.0.Esp as u64;
stack_frame64.AddrStack.Mode = AddrModeFlat;
stack_frame64.AddrFrame.Offset = context.0.Ebp as u64;
stack_frame64.AddrFrame.Mode = AddrModeFlat;

while dbghelp.StackWalk64()(
image as DWORD,
IMAGE_FILE_MACHINE_I386 as DWORD,
process,
thread,
frame_ptr,
&mut stack_frame64,
&mut context.0 as *mut CONTEXT as *mut _,
None,
Some(function_table_access),
Some(get_module_base),
None,
) == TRUE
{
frame.inner.base_address = get_module_base(process_handle, frame.ip() as _) as _;
let frame = super::Frame {
inner: Frame {
base_address: get_module_base(process_handle, stack_frame64.AddrPC.Offset)
as _,
ip: stack_frame64.AddrPC.Offset as *mut _,
sp: stack_frame64.AddrStack.Offset as *mut _,
#[cfg(not(target_env = "gnu"))]
inline_context: None,
},
};

if !cb(&frame) {
break;
Expand All @@ -205,53 +237,3 @@ pub unsafe fn trace(cb: &mut dyn FnMut(&super::Frame) -> bool) {
}
}
}

#[cfg(target_arch = "x86_64")]
fn init_frame(frame: &mut Frame, ctx: &CONTEXT) -> WORD {
frame.addr_pc_mut().Offset = ctx.Rip as u64;
frame.addr_pc_mut().Mode = AddrModeFlat;
frame.addr_stack_mut().Offset = ctx.Rsp as u64;
frame.addr_stack_mut().Mode = AddrModeFlat;
frame.addr_frame_mut().Offset = ctx.Rbp as u64;
frame.addr_frame_mut().Mode = AddrModeFlat;

IMAGE_FILE_MACHINE_AMD64
}

#[cfg(target_arch = "x86")]
fn init_frame(frame: &mut Frame, ctx: &CONTEXT) -> WORD {
frame.addr_pc_mut().Offset = ctx.Eip as u64;
frame.addr_pc_mut().Mode = AddrModeFlat;
frame.addr_stack_mut().Offset = ctx.Esp as u64;
frame.addr_stack_mut().Mode = AddrModeFlat;
frame.addr_frame_mut().Offset = ctx.Ebp as u64;
frame.addr_frame_mut().Mode = AddrModeFlat;

IMAGE_FILE_MACHINE_I386
}

#[cfg(target_arch = "aarch64")]
fn init_frame(frame: &mut Frame, ctx: &CONTEXT) -> WORD {
frame.addr_pc_mut().Offset = ctx.Pc as u64;
frame.addr_pc_mut().Mode = AddrModeFlat;
frame.addr_stack_mut().Offset = ctx.Sp as u64;
frame.addr_stack_mut().Mode = AddrModeFlat;
unsafe {
frame.addr_frame_mut().Offset = ctx.u.s().Fp as u64;
}
frame.addr_frame_mut().Mode = AddrModeFlat;
IMAGE_FILE_MACHINE_ARM64
}

#[cfg(target_arch = "arm")]
fn init_frame(frame: &mut Frame, ctx: &CONTEXT) -> WORD {
frame.addr_pc_mut().Offset = ctx.Pc as u64;
frame.addr_pc_mut().Mode = AddrModeFlat;
frame.addr_stack_mut().Offset = ctx.Sp as u64;
frame.addr_stack_mut().Mode = AddrModeFlat;
unsafe {
frame.addr_frame_mut().Offset = ctx.R11 as u64;
}
frame.addr_frame_mut().Mode = AddrModeFlat;
IMAGE_FILE_MACHINE_ARMNT
}
2 changes: 0 additions & 2 deletions src/backtrace/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,6 @@ cfg_if::cfg_if! {
mod dbghelp;
use self::dbghelp::trace as trace_imp;
pub(crate) use self::dbghelp::Frame as FrameImp;
#[cfg(target_env = "msvc")] // only used in dbghelp symbolize
pub(crate) use self::dbghelp::StackFrame;
} else {
mod noop;
use self::noop::trace as trace_imp;
Expand Down
Loading

0 comments on commit beb5683

Please sign in to comment.