Runtime doubles when order of the tests is changed #822

GunterSchmidt · 2024-10-23T17:09:07Z

This is a bit complex to explain and weird behavoir. I did my fair share of performance tests but this never happend.

I have two test groups defined:

group 1 has one test: A
group 2 has three tests B,C,D

When I run this in this order, all results are normal.
When I flip the order of the groups like this

first group 2: B,C,D
second group1
then also everything is normal

However, when I also change the order of the tests

group 2: C,D,B
Then the runtime of B doubles roughly. Running group 1 first would also produce normal results.
This also only happens when I use u8 for an array which requires a lot of casting to usize. Using u16 does not yield different results.
However, I do not believe using u8 is an issue per se, as group 2: C, B, D also runs fine.
Also, when I run group 2: B,C,D or B,C,D,B everything is fine.

The problematic benchmark code for verification.
For copyright reasons I cannot post the input file, please use this.

This looks like this:
Run A:

day_08::part2/part2_array: time: [289.91 µs 291.41 µs 293.19 µs]
day_08::part2/part2_array_same time: [285.70 µs 287.68 µs 290.38 µs]

Run B:
*day_08::part2/part2_array_same time: [628.49 µs 631.39 µs 634.58 µs

     Running benches/benchmarks-criterion.rs (C:\Development\Rust\advent-of-code-rust\2023\target\release\deps\day_08_bench_criterion-02315b7a3c5c1594.exe)      
Gnuplot not found, using plotters backend
**day_08::part2/part2_array
                        time:   [289.91 µs 291.41 µs 293.19 µs]**
                        change: [-6.3076% -5.0700% -3.8992%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
day_08::part2/part2_v1  time:   [918.12 µs 922.84 µs 928.02 µs]
                        change: [-0.6485% +0.0063% +0.7588%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking day_08::part2/part2_v2: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 2.2s, enable flat sampling, or reduce sample count to 60.
day_08::part2/part2_v2  time:   [427.63 µs 430.41 µs 433.48 µs]
                        change: [-0.2765% +0.8468% +2.0260%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
**day_08::part2/part2_array_same
                        time:   [285.70 µs 287.68 µs 290.38 µs]**
                        change: [-1.1322% -0.1992% +0.7388%] (p = 0.69 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

day_08::part1/part1     time:   [140.82 µs 141.23 µs 141.69 µs]
                        change: [-2.7411% -1.8442% -0.9985%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

And the problematic run:

    Running benches/benchmarks-criterion.rs (C:\Development\Rust\advent-of-code-rust\2023\target\release\deps\day_08_bench_criterion-02315b7a3c5c1594.exe)      
Gnuplot not found, using plotters backend
day_08::part2/part2_v1  time:   [919.15 µs 923.55 µs 928.27 µs]
                        change: [-0.6696% +0.0774% +0.8478%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking day_08::part2/part2_v2: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 2.2s, enable flat sampling, or reduce sample count to 60.
day_08::part2/part2_v2  time:   [431.21 µs 433.04 µs 435.22 µs]
                        change: [+0.3734% +1.3685% +2.3048%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking day_08::part2/part2_array_same: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 3.2s, enable flat sampling, or reduce sample count to 50.
**day_08::part2/part2_array_same
                        time:   [628.49 µs 631.39 µs 634.58 µs]**
                        change: [+116.94% +118.94% +120.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

day_08::part1/part1     time:   [145.17 µs 145.50 µs 145.79 µs]
                        change: [+1.7073% +2.3928% +2.9905%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 25 outliers among 100 measurements (25.00%)
  10 (10.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

I hope I could make myself clear and maybe someday this will be investigated.
Criterion2 shares the same behavior.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime doubles when order of the tests is changed #822

Runtime doubles when order of the tests is changed #822

GunterSchmidt commented Oct 23, 2024 •

edited

Loading

Runtime doubles when order of the tests is changed #822

Runtime doubles when order of the tests is changed #822

Comments

GunterSchmidt commented Oct 23, 2024 • edited Loading

GunterSchmidt commented Oct 23, 2024 •

edited

Loading