Support for collection of backtrace memory addresses #966

hammad45 · 2023-11-28T18:46:39Z

Added support for collecting backtrace memory addresses using backtrace () and backtrace_symbols ()
Get address-to-line mappings using addr2line for the unique memory addresses corresponding to the binary
Modified Darshan logs to include the address-to-line mappings as part of the Darshan header and the complete memory addresses stack as part of the DXT trace data

jakobluettgau · 2023-12-04T11:15:53Z

Hi Hammad, this looks really nice. I'll try to create some logs with this new mode for DXT in MPI and POSIX as well, but could you share one of your logs for testing too?

Also since this appears to change the log format, it should progress the log format versions, for example, DXT_*_VER for the affected modules in darshan-dxt-log-format.h.

I'll try to run some tests and get back with additional feedback.

jakobluettgau · 2023-12-13T16:04:47Z

It looks like this does regress for old darshan logs, it should not be a big deal to support both, but as is old logs will error out both for darshan-parser and darshan-dxt-parser, as well as pydarshan:

Error: failed to read darshan log file header. Error: darshan_log_open failed to read darshan log file header: Success.

jakobluettgau · 2023-12-13T16:51:24Z

I guess a small paragraph for the documentation might be helpful as well. Something along the lines of:

Target application needs to be compiled with debugging symbols (-g) otherwise line mappings are less meaningful and just show ??
To collect backtrace information, a new environment variable has to be set export DXT_ENABLE_STACK_TRACE=1
Maybe a reference to online man pages of backtrace and addr2line for an interested user
And maybe at some point with more experience an expectation of added overhead when enabled

Maybe some other noteworthy remarks from your experience when implementing this :)

shanedsnyder · 2023-12-13T22:22:52Z

Hi Hammad,

Thanks for submitting this PR!

Could you provide some detailed comments/discussion on how exactly the stack traces are collected with this code? I think it would take me some time to grok all the code changes, but it will be easier if I'm able to better understand how this process is intended to be carried out. From a relatively quick first scan, it seems:

Processes independently capture stacktrace info as read/write calls come into DXT
At DXT module shutdown time, information related to these stacktraces is extracted and written to per-process files
At Darshan shutdown time, rank 0 serially reads each per-rank file, extracts/transforms the data, then writes the resulting output data into the Darshan header

Any more elaborations there would be very welcome.

Without understanding the full changes yet, I do have a couple of higher level concerns:

Ultimately storing this stack data in the Darshan log header is almost certainly not what we want to do
- The header is a small, uncompressed region of the Darshan log file to store compact metadata about the modules (i.e., their version, how much compressed data they wrote, etc.), so it's not really where we'd imagine storing big chunks of characterization data
- If we can't store the stack traces alongside the trace segments captured by the DXT modules, I think I'd recommend we create an entirely new module (e.g., DXT_STACKS) that stores this info
The shutdown process seems pretty inefficient. It looks like the DXT module on each process writes out it's own file at module shutdown time, but then as Darshan is shutting down and writing it's log file it has to have rank 0 read each of these per-rank files serially
- Could we just use MPI collective operations at module shutdown time to reduce all of the stack data to rank 0? I'd guess that will be much more efficient than serializing all of this through the file system.

hammad45 and others added 10 commits November 28, 2023 10:21

Added support for stack memory addresses in DXT

3c0b66f

Added address to line mapping for DXT modules

fe6f9a1

Updated darshan-util and pydarshan

6188db7

Updated backtrace code

462252e

applying a few optimizations for POSX

fa3f382

Optimized code for getting address mappings

fa13927

Fixed rank 0 data missing bug

54c3aae

Bug fix

6d59cf9

Updated Backtrace Code

6825e76

Updated Backtrace Code

45f0f52

github-actions bot added the pydarshan label Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for collection of backtrace memory addresses #966

Support for collection of backtrace memory addresses #966

hammad45 commented Nov 28, 2023

jakobluettgau commented Dec 4, 2023

jakobluettgau commented Dec 13, 2023

jakobluettgau commented Dec 13, 2023

shanedsnyder commented Dec 13, 2023

Support for collection of backtrace memory addresses #966

Are you sure you want to change the base?

Support for collection of backtrace memory addresses #966

Conversation

hammad45 commented Nov 28, 2023

jakobluettgau commented Dec 4, 2023

jakobluettgau commented Dec 13, 2023

jakobluettgau commented Dec 13, 2023

shanedsnyder commented Dec 13, 2023