-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle hw counter multiplexing case? #2422
Comments
I'm not familiar with the way the kernel PMU multiplexes the perf counters. Is it "overcommitting" them by allowing perf_event_open to succeed even if there's no hardware perf counter available? That would be very bad IMO. Are your other jobs usage of hardware perf counters not limited to specific processes? |
Ok I read perf_event_open's man page again. In general there's no way for rr to survive a situation where the performance counters are multiplexed. |
Could rr use instructions-retired counter to measure/replay each timeslice instead of retired-conditional-branches? AFAIK, that would be a fixed-function counter, so it should always be available regardless of multiplexing. |
Instructions-retired itself is not deterministic (because an instruction that triggers a page fault is counted as executed twice, and page faults are non deterministic) and the page fault performance counter is not available as a fixed function counter so I don't think that will help. |
Thanks for the answers! |
So FWIW it's not clear to how other jobs could force multiplexing to happen for rr's recorded processes, unless those jobs install cpu-global counters (or specifically attach counters to rr's recorded processes, which would be crazy). Turns out, however, that My understanding is that with that change, our counters will take priority of cpu-global non-pinned counters, which might help in your situation. They should only be pushed off the PMU by cpu-global pinned counters. Hopefully your jobs aren't using those. |
Hello,
We are currently trying to run
rr
on machines that execute different kinds of jobs. We are seeing failures forrr
because some of the other jobs are also using hw counters, and the kernel is multiplexing them, sorr
does not get correct values (e.g. it often fails already during startup with "Got 0 branch events, expected at least 500").I filed #2421 to improve detecting this case, also when this happens during the runtime of
rr
(and not just during startup).In general, do you have any ideas how to work around this problem?
Thanks!
Tobi
The text was updated successfully, but these errors were encountered: