Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect when the kernel multiplexed hw counters. #2421

Merged
merged 1 commit into from
Dec 14, 2019

Conversation

tbosch
Copy link
Contributor

@tbosch tbosch commented Dec 10, 2019

If this happens, the counter values are incorrect.

This happens when e.g. multiple rr instances run
in parallel on the same machine.

If this happens, the counter values are incorrect.

This happens when e.g. multiple rr instances run
in parallel on the same machine.
@rocallahan rocallahan merged commit 34a97e8 into rr-debugger:master Dec 14, 2019
@rocallahan
Copy link
Collaborator

Thanks!

@rocallahan
Copy link
Collaborator

This happens when e.g. multiple rr instances run in parallel on the same machine.

BTW, this is NOT correct. We run many rr instances on the same machine a lot, e.g. rr tests routinely do this.

AFAIK triggering this requires monitoring "too many" different hardware events in the same process using some combination of system-wide and process-specific perf events. rr only monitors one or two events, and those only in the processes recorded or replayed by rr.

@rocallahan
Copy link
Collaborator

I guess it is most likely that your systems have some system-wide monitoring going on that competes with rr for the PMU hardware.

It's unfortunate that perf_event_open doesn't support an option that lets us indicate that we don't support multiplexing.

@rocallahan
Copy link
Collaborator

This actually broke things on my machine :-(

@rocallahan
Copy link
Collaborator

I've reverted this change, sorry. Instead I've applied 3edf7ed which sets pinned on our counters to disable multiplexing.

@rocallahan
Copy link
Collaborator

BTW the error was here:

  tmp_attr.read_format =
      PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING;
  int fd = syscall(__NR_perf_event_open, attr, tid, -1, group_fd, 0);

We needed to pass &tmp_attr instead of attr to perf_event_open.

We could reland this with that fixed, but I think it should not be needed now that we're requesting pinned counters.

@tbosch
Copy link
Contributor Author

tbosch commented Dec 16, 2019

Thanks for the pinning change.

I tried it together with the detection for multiplexing, and the multiplexing is still happening :-( So I created a roll forward #2423 that fixes the problem with tmp_attr that you mentioned above.

@rocallahan
Copy link
Collaborator

rocallahan commented Dec 16, 2019 via email

@tbosch
Copy link
Contributor Author

tbosch commented Dec 16, 2019

Mmh, do you have an idea how we could narrow this further down?

I am thinking of creating an isolated test program for this that demonstrates whether multiplexing / the error behavior you mentioned happens...
WDYT?

@rocallahan
Copy link
Collaborator

rocallahan commented Dec 17, 2019 via email

@rocallahan
Copy link
Collaborator

Okay, here's my test program:

#define _GNU_SOURCE
#include <fcntl.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/sysinfo.h>
#include <unistd.h>

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Usage: %s [pinned|unpinned] <raw-hex>\n", argv[0]);
    exit(1);
  }
  int pinned = strcmp(argv[1], "pinned") == 0;
  int config = strtol(argv[2], NULL, 0);

  struct perf_event_attr attr;
  memset(&attr, 0, sizeof(attr));
  attr.type = PERF_TYPE_RAW;
  attr.size = sizeof(attr);
  attr.exclude_kernel = 1;
  attr.exclude_guest = 1;
  attr.pinned = pinned;
  attr.config = config;

  int num_cpus = get_nprocs();
  int* fds = (int*)malloc(sizeof(int)*num_cpus);
  for (int i = 0; i < num_cpus; ++i) {
    fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1, 0);
    if (fds[i] < 0) {
      perror("perf_event_open");
      exit(1);
    }
  }
  while (1) {
    char buf[8];
    for (int i = 0; i < num_cpus; ++i) {
      ssize_t ret = read(fds[i], buf, sizeof(buf));
      if (ret < 0) {
        perror("read");
        exit(1);
      }
    }
    sleep(1);
  }
}

@rocallahan
Copy link
Collaborator

Testing unpinned per-CPU counters:

[root@localhost rr]# ~roc/tmp/test unpinned 0x5301cb &
[1] 10588
[root@localhost rr]# ~roc/tmp/test unpinned 0x5102b1 &
[2] 10774
[root@localhost rr]# ~roc/tmp/test unpinned 0x5101ae &
[3] 10787
[root@localhost rr]# ~roc/tmp/test unpinned 0x5101a2 &
[4] 10795

Then perf stat (which uses unpinned counters) fails as expected:

[roc@localhost rr]$ perf stat -e r5101c4 /bin/ls
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party

 Performance counter stats for '/bin/ls':

     <not counted>      r5101c4                                                       (0.00%)

       0.001301922 seconds time elapsed

       0.001306000 seconds user
       0.000000000 seconds sys


Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog

@rocallahan
Copy link
Collaborator

But rr, using pinned counters, works as expected:

[roc@localhost rr]$ rr record /bin/ls
rr: Saving execution to trace directory `/home/roc/.local/share/rr/ls-1'.
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party
[roc@localhost rr]$ rr replay -a
CMakeLists.txt	     configure	      include	 README.md  src		 Vagrantfile
CMakeLists.txt.orig  CONTRIBUTING.md  LICENSE	 rr.spec    test.out
CODE_OF_CONDUCT.md   gdb_rr.log       perf.data  scripts    third-party

@rocallahan
Copy link
Collaborator

If I kill those unpinned counters and restart with pinned per-CPU counters:

[root@localhost rr]# ~roc/tmp/test pinned 0x5301cb &
[1] 11024
[root@localhost rr]# ~roc/tmp/test pinned 0x5101a2 &
[2] 11028
[root@localhost rr]# ~roc/tmp/test pinned 0x5101ae &
[3] 11029
[root@localhost rr]# ~roc/tmp/test pinned 0x5102b1 &
[4] 11030

Then perf stat fails as before, and rr also fails to read the counter values:

[roc@localhost rr]$ rr record /bin/ls
rr: /home/roc/rr/rr/src/PerfCounters.cc:275: int64_t rr::read_counter(rr::ScopedFd&): Assertion `nread == sizeof(val)' failed.
Aborted (core dumped)

This error could be more informative, but as expected, having requested pinning, we get a read error instead of getting invalid counter values.

@rocallahan
Copy link
Collaborator

These results all match my expectations: kernel 5.3.11-300.fc31.x86_64.

You may wish to verify these results on whatever kernel you're using. I guess it's possible that older kernels have different behavior, or your kernel has patches making it different from upstream.

Another possibility is that your systemwide counters are using PERF_FLAG_PID_CGROUP. I don't know exactly how that is supposed to behave, and I haven't tested it. Maybe you can try modifying the test program to use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants