Add code measuring CPU frequency #125

Bulat-Ziganshin · 2020-05-07T18:29:45Z

I just wrote a little snippet measuring actual frequency of CPU core performing this code: https://encode.su/threads/3389-Code-snippet-to-compute-CPU-frequency

Please consider using it to correctly compute number of CPU cycles spent by hash functions - instead of RDTSC whose fakeness was discussed here a few years ago.

rurban · 2020-05-08T10:35:03Z

Nice. Just we already have better measurements than gettimeofday

And on Linux you can just ask the kernel. It deviates constantly btw.

erthink · 2020-05-08T11:43:38Z

As I wrote earlier, seems that the best code for measuring up to clock cycles inside the t1ha benchmark.

It supports x86, arm64, ppc64, s390x, e2k, ia64, etc, as well as perf_event, emscripten_get_now(), mach_absolute_time(), QueryPerformanceCounter(), read_wall_time(), clock_gettime(), gethrtime() and gettimeofday() (i.e. more than google-benchmark).
For instance see logs on Travis-CI.

I was planning to rearrange this code as a separate "mera" library, but I don't have time for this yet.
Therefore, reusing this code is not as convenient as we would like.
However, it is worth mentioning in this context.

PPC64:

Preparing to benchmarking...
 - running on CPU#10
 - use MFSPR(268) as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 6 cycles, 0.166667 iteration/cycle

ARM64:

Preparing to benchmarking...
 - running on CPU#30
 - use CNTVCT_EL0 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 0.2 tick, 5 iterations/tick

x390s

Preparing to benchmarking...
 - running on CPU#3
 - use STCKE as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 6 cycles, 0.166667 iteration/cycle

AMD64:

Preparing to benchmarking...
 - perf_event_open(): No such file or directory
 - running on CPU#0
 - use RDTSCP as clock source for benchmarking
 - assume it cheap and floating (RESULTS MAY VARY AND BE USELESS)
 - measure granularity and overhead: 38 cycles, 0.0263158 iteration/cycle

Bulat-Ziganshin · 2020-05-08T18:31:50Z

It seems that you both say about measuring time intervals, while the code I provided is about measuring effective CPU frequency - using any abovementioned way to measure the time interval.

My point is that using rdtsc to count CPU cycles is broken for about 10 years, because it reports cycles of fixed base frequency (such as 2 GHz in reports provided in encode.su thread). So, instead I wrote small code for which we know how much CPU cycles it will be executed, and by measuring time spent on it, we can easily compute the frequency. Moreover, the method works for almost any supersclalar CPU.

Using this approach, we can finally correctly report how much CPU cycles spent for each hashing operation.

rurban · 2020-05-08T19:23:33Z

Yes, I know these loop counting tricks from gamers to calculate the frame rate. It's a rather stable way to do it. I'll check if rtdsc with cpuid is better or worse.

But "better" would be reading the freq from the kernel via proc.

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

YellowOnion · 2023-12-08T07:49:46Z

But "better" would be reading the freq from the kernel via proc.

Switching frequency on a modern core is usually in microseconds, AMD's Precision boost is pretty crazy, my CPU will be anywhere between 4.5 and 5.1GHz with single core boost, constantly changing due power demand etc, I kinda doubt you can get accurate readings through anything non-atomic with the execution of the code.

Real world time is also important especially when older Intel's AVX512 will clock a system down below "base" (Zen 4 doesn't have this penalty), potentially hiding some of the performance penalty because a user might think 30 cycles at 2GHz is better than 40cycles at 3GHz.

There's also other things to consider, I'm pretty sure some AVX units can take upwards of 200 cycles just to turn on, which might not be measured here if the unit is already hot.

https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

darkk · 2024-09-13T09:12:11Z

I've drafted a patch that approaches the issue from the different angle.

I usually know CPU frequency of the machine I'm working with. Also, there might be some easy way to query it. So, the frequency itself is of purely informational value to me. However, I don't always know if the cycle counter code is somewhat correct and #241 together with #292 highlight that, so some "visual control" is handy.

So, I decided to combine tick counter and real-time clock into ae7ccd9 that produces output similar to the following:

--- Testing xxHash32 "xxHash, 32-bit for x86" POOR

[[[ Speed Tests ]]]

WARNING: timer resolution is 158 (0x9e) ticks (0x1001f296d4a00eaa - 0x1001f296d4a00e5b). Broken VDSO?
Bulk speed test - 262144-byte keys, cache_linesize 32
Alignment   8 -  0.287 bytes/cycle -  3.482 cycles/byte -   821.65 MiB/sec @ 3 GHz -   158.85 MiB/s @ 580 MHz
Alignment   7 -  0.139 bytes/cycle -  7.204 cycles/byte -   397.15 MiB/sec @ 3 GHz -    76.78 MiB/s @ 580 MHz, -51.7% B/c
Alignment   6 -  0.139 bytes/cycle -  7.216 cycles/byte -   396.51 MiB/sec @ 3 GHz -    76.66 MiB/s @ 580 MHz, -51.7% B/c
Alignment   5 -  0.138 bytes/cycle -  7.226 cycles/byte -   395.94 MiB/sec @ 3 GHz -    76.55 MiB/s @ 580 MHz, -51.8% B/c
Alignment   4 -  0.306 bytes/cycle -  3.269 cycles/byte -   875.23 MiB/sec @ 3 GHz -   169.21 MiB/s @ 580 MHz, +6.5% B/c
Alignment   3 -  0.138 bytes/cycle -  7.226 cycles/byte -   395.92 MiB/sec @ 3 GHz -    76.54 MiB/s @ 580 MHz, -51.8% B/c
Alignment   2 -  0.138 bytes/cycle -  7.226 cycles/byte -   395.94 MiB/sec @ 3 GHz -    76.55 MiB/s @ 580 MHz, -51.8% B/c
Alignment   1 -  0.138 bytes/cycle -  7.226 cycles/byte -   395.93 MiB/sec @ 3 GHz -    76.55 MiB/s @ 580 MHz, -51.8% B/c
Alignment   0 -  0.255 bytes/cycle -  3.926 cycles/byte -   728.66 MiB/sec @ 3 GHz -   140.87 MiB/s @ 580 MHz, -11.3% B/c
Average       -  0.187 bytes/cycle -  5.361 cycles/byte -   533.66 MiB/sec @ 3 GHz -   103.17 MiB/s @ 580 MHz, ~64.0%
Best          -  0.306 bytes/cycle -  3.269 cycles/byte -   875.23 MiB/sec @ 3 GHz -   169.21 MiB/s @ 580 MHz, -54.8% B/c for worst
WARNING: $worst and $best deviate by -54.8% (> 1%). Misaligned read penalty?
	\ Try SMHASHER_ALIGNAS_STEP=2
WARNING: alignas(8) and alignas(0) deviate by -11.3% (> 1%). Insufficient alignas() granularity?
	\ Try SMHASHER_ALIGNAS_MAX=32 or SMHASHER_ALIGNAS_MAX=64
WARNING: key[262144] 0.306 B/c and key[pagesize=4096] 0.421 B/c deviate by -27.4% (> 10%). Memory wall hit?
	\ Try SMHASHER_BLOCKSIZE=$(getconf -a | awk '/[^I]CACHE_SIZE/ && $2 > 0 {print $2/2}' | shuf -n 1)

rurban · 2024-09-13T10:38:18Z

Our cycle counter code is correct for Intel. In fact one of the only ones which is actually correct, after an Intel paper.

darkk · 2024-09-13T12:52:03Z

I think so. I'm mostly focused on MIPS (having 32-bit cycle counter) and ARM at this moment. The output above comes from my go-to MIPS32 router and reflects its frequency correctly.

The code is basically a (rdtsc2 - rdtsc1) / (timeofday2 - timeofday1) adjusted for overflows and timers. It reports 2808 MHz on my Intel laptop that is somewhat in-sync with cpu MHz : 2800.000 from /proc/cpuinfo under performance governor. I have no way to say anything about correctness of this number and extra 8 MHz as I'm not familiar with dynamic frequency scaling of Intel CPUs.

erthink mentioned this issue May 10, 2020

[WIP] SSSE3/SSE2, NEON, and Scalar ports, minor fixes espadrine/shishua#2

Closed

rurban added a commit that referenced this issue Oct 1, 2020

calculate cpu freq

f42e5c5

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Oct 1, 2020

calculate cpu freq

8e66151

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Oct 1, 2020

calculate cpu freq

81a550e

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Nov 26, 2020

calculate cpu freq

4a18d98

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Nov 28, 2020

calculate cpu freq

12a5b41

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Jan 21, 2021

calculate cpu freq

48c33d9

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Nov 19, 2021

calculate cpu freq

809fbdb

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Jan 27, 2022

calculate cpu freq

9330da2

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

rurban added a commit that referenced this issue Apr 2, 2022

calculate cpu freq

91a2b42

not hardcoded to 3 GHz. Some code is based on GH #125, but this result is not really good. On linux I found an easy way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code measuring CPU frequency #125

Add code measuring CPU frequency #125

Bulat-Ziganshin commented May 7, 2020 •

edited

Loading

rurban commented May 8, 2020 •

edited

Loading

erthink commented May 8, 2020 •

edited

Loading

Bulat-Ziganshin commented May 8, 2020 •

edited

Loading

rurban commented May 8, 2020 •

edited

Loading

YellowOnion commented Dec 8, 2023

darkk commented Sep 13, 2024

rurban commented Sep 13, 2024

darkk commented Sep 13, 2024

Add code measuring CPU frequency #125

Add code measuring CPU frequency #125

Comments

Bulat-Ziganshin commented May 7, 2020 • edited Loading

rurban commented May 8, 2020 • edited Loading

erthink commented May 8, 2020 • edited Loading

Bulat-Ziganshin commented May 8, 2020 • edited Loading

rurban commented May 8, 2020 • edited Loading

YellowOnion commented Dec 8, 2023

darkk commented Sep 13, 2024

rurban commented Sep 13, 2024

darkk commented Sep 13, 2024

Bulat-Ziganshin commented May 7, 2020 •

edited

Loading

rurban commented May 8, 2020 •

edited

Loading

erthink commented May 8, 2020 •

edited

Loading

Bulat-Ziganshin commented May 8, 2020 •

edited

Loading

rurban commented May 8, 2020 •

edited

Loading