Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks for upgradeable loader instructions #3659

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

KirillLykov
Copy link

@KirillLykov KirillLykov commented Nov 15, 2024

Problem

Part of #3364, to benchmark each loader instructions to determine their static CU consumption.

This PR lacks 1 instruction:

Summary of Changes

add benches for each instruction, aiming to run through their happy-path.

Bmk run results

Machine: 48x AMD EPYC 7502P 32-Core Processor

Benchmarking initialize_buffer: Collecting 100 samples in estimated 5.0298 s (550k itera
initialize_buffer       time:   [9.1225 µs 9.1239 µs 9.1253 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

write                   time:   [9.5727 µs 9.5749 µs 9.5771 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking upgradeable_upgrade: Collecting 100 samples in estimated 5.2428 s (101k ite
upgradeable_upgrade     time:   [47.692 µs 47.863 µs 48.079 µs]

Benchmarking set_authority: Collecting 100 samples in estimated 5.0219 s (475k iteration
set_authority           time:   [10.582 µs 10.584 µs 10.586 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

close                   time:   [10.217 µs 10.220 µs 10.224 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking extend_program/64: Collecting 100 samples in estimated 5.1122 s (101k itera
extend_program/64       time:   [45.792 µs 45.961 µs 46.186 µs]
                        thrpt:  [1.3215 MiB/s 1.3280 MiB/s 1.3329 MiB/s]
Benchmarking extend_program/256: Collecting 100 samples in estimated 5.1309 s (101k iter
extend_program/256      time:   [50.421 µs 50.439 µs 50.458 µs]
                        thrpt:  [4.8385 MiB/s 4.8403 MiB/s 4.8420 MiB/s]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Benchmarking extend_program/1024: Collecting 100 samples in estimated 5.0467 s (101k ite
extend_program/1024     time:   [46.170 µs 46.180 µs 46.192 µs]
                        thrpt:  [21.142 MiB/s 21.147 MiB/s 21.152 MiB/s]
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe
Benchmarking extend_program/4096: Collecting 100 samples in estimated 5.2005 s (101k ite
extend_program/4096     time:   [50.488 µs 50.525 µs 50.574 µs]
                        thrpt:  [77.238 MiB/s 77.313 MiB/s 77.369 MiB/s]

Benchmarking set_authority_checked: Collecting 100 samples in estimated 5.0435 s (480k i
set_authority_checked   time:   [10.530 µs 10.531 µs 10.533 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

@KirillLykov KirillLykov marked this pull request as ready for review December 12, 2024 12:02
@KirillLykov KirillLykov requested a review from a team as a code owner December 12, 2024 12:02
@LucasSte
Copy link

Looks good on my side. Perhaps @Lichtso may want to have a look?

@Lichtso
Copy link

Lichtso commented Dec 12, 2024

Would like to wait until the RBPF bump, which comes with SBPFv3 test programs:
#3830

That could change the measured timings and thus the relevant CU costs going forward.

@KirillLykov
Copy link
Author

KirillLykov commented Dec 12, 2024

@Lichtso not sure that I understood well the #3830 , I want to add these benchmarks but not the results of them.

@Lichtso
Copy link

Lichtso commented Dec 12, 2024

It has been a whole major release of Agave since the last RBPF release, so there is a lot of changes accumulated in that. Thus the benchmarks are kind of meaningless with the outdated version.

@KirillLykov
Copy link
Author

@Lichtso I understand that it might be a bad idea to use these benchmarks now to collect the data, but I would like just to benchmark code. My understanding is that #3830 will not break these benchmarks.

@KirillLykov
Copy link
Author

Added ExtendProgram as well, thanks @LucasSte for the explanation of accounts setup

Copy link

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can you also post the bench result if not too much trouble?

let mut file = File::open("test_elfs/out/noop_aligned.so").expect("file open failed");
let mut elf_orig = Vec::new();
file.read_to_end(&mut elf_orig).unwrap();
let mut file = File::open("test_elfs/out/noop_unaligned.so").expect("file open failed");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these elf files need to be added under benches folder?

@tao-stones
Copy link

Would like to wait until the RBPF bump, which comes with SBPFv3 test programs: #3830

That could change the measured timings and thus the relevant CU costs going forward.

If the benchmarking mechanism is reasonable, we can still merge this PR, then run it again after RBPFv3?

@KirillLykov
Copy link
Author

@Lichtso what about merging this PR and then run it again after RBPFv3 as @tao-stones suggested?

@Lichtso
Copy link

Lichtso commented Jan 3, 2025

The PR was merged, so this one is no longer blocked.

@KirillLykov
Copy link
Author

please approve this one

@Lichtso
Copy link

Lichtso commented Jan 6, 2025

These benchmarks are fine:

  • bench_initialize_buffer
  • bench_write
  • bench_set_authority
  • bench_close
  • bench_set_authority_checked

However, these instructions:

  • DeployWithMaxDataLen
  • Upgrade
  • ExtendProgram

Are completely dependent on the program that is being verified and the results might vary a lot. I would prefer if we split them in a separate PR and further discuss our options there.

@tao-stones
Copy link

However, these instructions:

  • DeployWithMaxDataLen
  • Upgrade
  • ExtendProgram

Are completely dependent on the program that is being verified and the results might vary a lot. I would prefer if we split them in a separate PR and further discuss our options there.

Good call to discuss these instructions separately. Think out loud here, are they being planned to migrate to bpf?

@KirillLykov
Copy link
Author

KirillLykov commented Jan 8, 2025

Probably better to extract removal of preexisting serialization bench into a separate PR, but instead of removal we can modify it to use Critirion instead of nightly (1).

So there is a chain of PRs:

  1. rewrite serialization bench with Criterion #4350
  2. Benchmarks for upgradeable loader instructions #3659 (this one)
  3. Add benchmarks for loader which uses elf file with program #4341

@KirillLykov
Copy link
Author

@Lichtso updated PR, please review

@KirillLykov KirillLykov added the automerge automerge Merge this Pull Request automatically once CI passes label Jan 9, 2025
@mergify mergify bot merged commit 2a4a1ac into anza-xyz:master Jan 9, 2025
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge automerge Merge this Pull Request automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants