Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime doubles when order of the tests is changed #822

Open
GunterSchmidt opened this issue Oct 23, 2024 · 0 comments
Open

Runtime doubles when order of the tests is changed #822

GunterSchmidt opened this issue Oct 23, 2024 · 0 comments

Comments

@GunterSchmidt
Copy link

GunterSchmidt commented Oct 23, 2024

This is a bit complex to explain and weird behavoir. I did my fair share of performance tests but this never happend.

I have two test groups defined:

  • group 1 has one test: A
  • group 2 has three tests B,C,D

When I run this in this order, all results are normal.
When I flip the order of the groups like this

  • first group 2: B,C,D
  • second group1
    then also everything is normal

However, when I also change the order of the tests

  • group 2: C,D,B
    Then the runtime of B doubles roughly. Running group 1 first would also produce normal results.
    This also only happens when I use u8 for an array which requires a lot of casting to usize. Using u16 does not yield different results.
    However, I do not believe using u8 is an issue per se, as group 2: C, B, D also runs fine.
    Also, when I run group 2: B,C,D or B,C,D,B everything is fine.

The problematic benchmark code for verification.
For copyright reasons I cannot post the input file, please use this.

This looks like this:
Run A:

  • day_08::part2/part2_array: time: [289.91 µs 291.41 µs 293.19 µs]
  • day_08::part2/part2_array_same time: [285.70 µs 287.68 µs 290.38 µs]

Run B:
*day_08::part2/part2_array_same time: [628.49 µs 631.39 µs 634.58 µs

     Running benches/benchmarks-criterion.rs (C:\Development\Rust\advent-of-code-rust\2023\target\release\deps\day_08_bench_criterion-02315b7a3c5c1594.exe)      
Gnuplot not found, using plotters backend
**day_08::part2/part2_array
                        time:   [289.91 µs 291.41 µs 293.19 µs]**
                        change: [-6.3076% -5.0700% -3.8992%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
day_08::part2/part2_v1  time:   [918.12 µs 922.84 µs 928.02 µs]
                        change: [-0.6485% +0.0063% +0.7588%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking day_08::part2/part2_v2: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 2.2s, enable flat sampling, or reduce sample count to 60.
day_08::part2/part2_v2  time:   [427.63 µs 430.41 µs 433.48 µs]
                        change: [-0.2765% +0.8468% +2.0260%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
**day_08::part2/part2_array_same
                        time:   [285.70 µs 287.68 µs 290.38 µs]**
                        change: [-1.1322% -0.1992% +0.7388%] (p = 0.69 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

day_08::part1/part1     time:   [140.82 µs 141.23 µs 141.69 µs]
                        change: [-2.7411% -1.8442% -0.9985%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

And the problematic run:

    Running benches/benchmarks-criterion.rs (C:\Development\Rust\advent-of-code-rust\2023\target\release\deps\day_08_bench_criterion-02315b7a3c5c1594.exe)      
Gnuplot not found, using plotters backend
day_08::part2/part2_v1  time:   [919.15 µs 923.55 µs 928.27 µs]
                        change: [-0.6696% +0.0774% +0.8478%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking day_08::part2/part2_v2: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 2.2s, enable flat sampling, or reduce sample count to 60.
day_08::part2/part2_v2  time:   [431.21 µs 433.04 µs 435.22 µs]
                        change: [+0.3734% +1.3685% +2.3048%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking day_08::part2/part2_array_same: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 2.0s. You may wish to increase target time to 3.2s, enable flat sampling, or reduce sample count to 50.
**day_08::part2/part2_array_same
                        time:   [628.49 µs 631.39 µs 634.58 µs]**
                        change: [+116.94% +118.94% +120.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

day_08::part1/part1     time:   [145.17 µs 145.50 µs 145.79 µs]
                        change: [+1.7073% +2.3928% +2.9905%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 25 outliers among 100 measurements (25.00%)
  10 (10.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

I hope I could make myself clear and maybe someday this will be investigated.
Criterion2 shares the same behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant