Detect when the kernel multiplexed hw counters. #2421

tbosch · 2019-12-10T18:00:32Z

If this happens, the counter values are incorrect.

This happens when e.g. multiple rr instances run
in parallel on the same machine.

If this happens, the counter values are incorrect. This happens when e.g. multiple rr instances run in parallel on the same machine.

rocallahan · 2019-12-14T01:37:44Z

Thanks!

rocallahan · 2019-12-14T01:45:08Z

This happens when e.g. multiple rr instances run in parallel on the same machine.

BTW, this is NOT correct. We run many rr instances on the same machine a lot, e.g. rr tests routinely do this.

AFAIK triggering this requires monitoring "too many" different hardware events in the same process using some combination of system-wide and process-specific perf events. rr only monitors one or two events, and those only in the processes recorded or replayed by rr.

rocallahan · 2019-12-14T01:49:06Z

I guess it is most likely that your systems have some system-wide monitoring going on that competes with rr for the PMU hardware.

It's unfortunate that perf_event_open doesn't support an option that lets us indicate that we don't support multiplexing.

rocallahan · 2019-12-14T02:35:05Z

This actually broke things on my machine :-(

rocallahan · 2019-12-14T03:01:10Z

I've reverted this change, sorry. Instead I've applied 3edf7ed which sets pinned on our counters to disable multiplexing.

rocallahan · 2019-12-14T03:26:42Z

BTW the error was here:

  tmp_attr.read_format =
      PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING;
  int fd = syscall(__NR_perf_event_open, attr, tid, -1, group_fd, 0);

We needed to pass &tmp_attr instead of attr to perf_event_open.

We could reland this with that fixed, but I think it should not be needed now that we're requesting pinned counters.

tbosch · 2019-12-16T18:01:47Z

Thanks for the pinning change.

I tried it together with the detection for multiplexing, and the multiplexing is still happening :-( So I created a roll forward #2423 that fixes the problem with tmp_attr that you mentioned above.

rocallahan · 2019-12-16T19:27:43Z

I think it's worth looking into why multiplexing is still happening. My understanding is that with pinned counters, only pinned perCPU counters would displace rr's counters, and in that case rr would error out when it next reads the counter value. If this isn't true it's important to know why.

…

On Tue, 17 Dec 2019, 07:02 Tobias Bosch, ***@***.***> wrote: Thanks for the pinning change. I tried it together with the detection for multiplexing, and the multiplexing is still happening :-( So I created a roll forward #2423 <#2423> that fixes the problem with tmp_attr that you mentioned above. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2421>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACIJRYBXLTPYS3LM5AUMK3QY67AZANCNFSM4JZCSFGA> .

tbosch · 2019-12-16T21:40:35Z

Mmh, do you have an idea how we could narrow this further down?

I am thinking of creating an isolated test program for this that demonstrates whether multiplexing / the error behavior you mentioned happens...
WDYT?

rocallahan · 2019-12-17T02:22:50Z

Yeah that sounds like a good idea. I was planning to run a similar experiment after I reassemble my laptop.

…

On Tue, 17 Dec 2019, 10:40 Tobias Bosch, ***@***.***> wrote: Mmh, do you have an idea how we could narrow this further down? I am thinking of creating an isolated test program for this that demonstrates whether multiplexing / the error behavior you mentioned happens... WDYT? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2421>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACIJR3Z6IZXINAUJ3EYDLTQY7YVHANCNFSM4JZCSFGA> .

rocallahan · 2019-12-17T07:19:28Z

Okay, here's my test program:

#define _GNU_SOURCE
#include <fcntl.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/sysinfo.h>
#include <unistd.h>

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Usage: %s [pinned|unpinned] <raw-hex>\n", argv[0]);
    exit(1);
  }
  int pinned = strcmp(argv[1], "pinned") == 0;
  int config = strtol(argv[2], NULL, 0);

  struct perf_event_attr attr;
  memset(&attr, 0, sizeof(attr));
  attr.type = PERF_TYPE_RAW;
  attr.size = sizeof(attr);
  attr.exclude_kernel = 1;
  attr.exclude_guest = 1;
  attr.pinned = pinned;
  attr.config = config;

  int num_cpus = get_nprocs();
  int* fds = (int*)malloc(sizeof(int)*num_cpus);
  for (int i = 0; i < num_cpus; ++i) {
    fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1, 0);
    if (fds[i] < 0) {
      perror("perf_event_open");
      exit(1);
    }
  }
  while (1) {
    char buf[8];
    for (int i = 0; i < num_cpus; ++i) {
      ssize_t ret = read(fds[i], buf, sizeof(buf));
      if (ret < 0) {
        perror("read");
        exit(1);
      }
    }
    sleep(1);
  }
}

rocallahan · 2019-12-17T07:20:30Z

Testing unpinned per-CPU counters:

[root@localhost rr]# ~roc/tmp/test unpinned 0x5301cb &
[1] 10588
[root@localhost rr]# ~roc/tmp/test unpinned 0x5102b1 &
[2] 10774
[root@localhost rr]# ~roc/tmp/test unpinned 0x5101ae &
[3] 10787
[root@localhost rr]# ~roc/tmp/test unpinned 0x5101a2 &
[4] 10795

Then perf stat (which uses unpinned counters) fails as expected:

[roc@localhost rr]$ perf stat -e r5101c4 /bin/ls
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party

 Performance counter stats for '/bin/ls':

     <not counted>      r5101c4                                                       (0.00%)

       0.001301922 seconds time elapsed

       0.001306000 seconds user
       0.000000000 seconds sys


Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog

rocallahan · 2019-12-17T07:21:18Z

But rr, using pinned counters, works as expected:

[roc@localhost rr]$ rr record /bin/ls
rr: Saving execution to trace directory `/home/roc/.local/share/rr/ls-1'.
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party
[roc@localhost rr]$ rr replay -a
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party

rocallahan · 2019-12-17T07:22:49Z

If I kill those unpinned counters and restart with pinned per-CPU counters:

[root@localhost rr]# ~roc/tmp/test pinned 0x5301cb &
[1] 11024
[root@localhost rr]# ~roc/tmp/test pinned 0x5101a2 &
[2] 11028
[root@localhost rr]# ~roc/tmp/test pinned 0x5101ae &
[3] 11029
[root@localhost rr]# ~roc/tmp/test pinned 0x5102b1 &
[4] 11030

Then perf stat fails as before, and rr also fails to read the counter values:

[roc@localhost rr]$ rr record /bin/ls
rr: /home/roc/rr/rr/src/PerfCounters.cc:275: int64_t rr::read_counter(rr::ScopedFd&): Assertion `nread == sizeof(val)' failed.
Aborted (core dumped)

This error could be more informative, but as expected, having requested pinning, we get a read error instead of getting invalid counter values.

rocallahan · 2019-12-17T07:28:15Z

These results all match my expectations: kernel 5.3.11-300.fc31.x86_64.

You may wish to verify these results on whatever kernel you're using. I guess it's possible that older kernels have different behavior, or your kernel has patches making it different from upstream.

Another possibility is that your systemwide counters are using PERF_FLAG_PID_CGROUP. I don't know exactly how that is supposed to behave, and I haven't tested it. Maybe you can try modifying the test program to use that.

Detect when the kernel multiplexed hw counters.

9f37859

If this happens, the counter values are incorrect. This happens when e.g. multiple rr instances run in parallel on the same machine.

tbosch mentioned this pull request Dec 10, 2019

How to handle hw counter multiplexing case? #2422

Closed

rocallahan merged commit 34a97e8 into rr-debugger:master Dec 14, 2019

tbosch mentioned this pull request Dec 16, 2019

Roll forward: "Detect when the kernel multiplexed hw counters." #2423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect when the kernel multiplexed hw counters. #2421

Detect when the kernel multiplexed hw counters. #2421

tbosch commented Dec 10, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

tbosch commented Dec 16, 2019

rocallahan commented Dec 16, 2019 via email

tbosch commented Dec 16, 2019

rocallahan commented Dec 17, 2019 via email

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

Detect when the kernel multiplexed hw counters. #2421

Detect when the kernel multiplexed hw counters. #2421

Conversation

tbosch commented Dec 10, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

rocallahan commented Dec 14, 2019

tbosch commented Dec 16, 2019

rocallahan commented Dec 16, 2019 via email

tbosch commented Dec 16, 2019

rocallahan commented Dec 17, 2019 via email

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019

rocallahan commented Dec 17, 2019