Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

vezenovm
Copy link
Contributor

@vezenovm vezenovm commented Sep 18, 2024

Description

Problem*

Resolves

Part of general effort to improve mem2reg.

Summary*

We sometimes have situations such as the following:

            b3():
              v9 = load v0
              v10 = eq v9, Field 2
              constrain v9 == Field 2
              v11 = load v2
              v12 = load v2
              v13 = eq v12, Field 2
              constrain v11 == Field 2

v2 does not have a known value, thus we do not remove the load. The mem2reg pass is acting as expected here. However, without a store or call to the reference between v11 = load v2 and v12 = load v2 we should be able to safely remove v12 = load v2 and map v12 -> v11.

This PR adds this logic as part of the initial mem2reg pass. We have a new last_loads map as part of a Block. This is currently cleared after analyzing block and is meant to only be per block. Unifying these last loads across blocks and the accurate predecessors can come in a follow-up. This is an initial proof of concept to show the optimizations validity.

Given an instruction we act as following:

Load

  • Check if we have a last load from the current load address. If we do, remove the current current and map its result to the previous load result.
  • Add a last load for the address.

Store

  • Remove the last load for the address

Call

  • Remove the last load for any reference arguments

I have also added two unit tests to mem2reg.rs

Additional Context

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@vezenovm vezenovm changed the title feat(perf): Track last loads per block and remove them if possible feat(perf): Track last loads per block in mem2reg and remove them if possible Sep 18, 2024
Copy link
Contributor

github-actions bot commented Sep 18, 2024

Changes to Brillig bytecode sizes

Generated at commit: a58edd62463c23cbac93f4d6cc9070308902f36e, compared to commit: 7d612425728a9d5852ec26a96af629aec7cd0a21

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 +10 ❌ +5.59%
regression_5252 -106 ✅ -2.28%
poseidonsponge_x5_254 -101 ✅ -2.37%
array_sort -9 ✅ -2.96%
6_array -14 ✅ -3.42%
array_to_slice -57 ✅ -6.97%

Full diff report 👇
Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 189 (+10) +5.59%
sha256 2,237 (-1) -0.04%
sha256_regression 6,565 (-3) -0.05%
sha256_var_size_regression 1,716 (-1) -0.06%
sha256_brillig_performance_regression 3,239 (-2) -0.06%
sha256_var_witness_const_regression 1,243 (-1) -0.08%
array_dynamic_blackbox_input 1,026 (-1) -0.10%
keccak256 3,748 (-4) -0.11%
sha256_var_padding_regression 4,787 (-6) -0.13%
slice_regex 2,163 (-3) -0.14%
7_function 594 (-1) -0.17%
bigint 1,991 (-5) -0.25%
brillig_cow_regression 2,154 (-6) -0.28%
ram_blowup_regression 957 (-3) -0.31%
nested_array_dynamic 1,985 (-7) -0.35%
conditional_1 1,188 (-6) -0.50%
schnorr 1,414 (-8) -0.56%
hashmap 20,376 (-149) -0.73%
to_le_bytes 132 (-1) -0.75%
array_if_cond_simple 131 (-1) -0.76%
uhashmap 13,420 (-108) -0.80%
tuple_inputs 364 (-3) -0.82%
generics 104 (-1) -0.95%
slices 2,013 (-24) -1.18%
nested_array_in_slice 1,185 (-15) -1.25%
fold_numeric_generic_poseidon 784 (-10) -1.26%
no_predicates_numeric_generic_poseidon 784 (-10) -1.26%
u128 2,758 (-38) -1.36%
poseidon2 358 (-5) -1.38%
bench_2_to_17 351 (-5) -1.40%
to_be_bytes 208 (-3) -1.42%
hash_to_field 136 (-2) -1.45%
fold_2_to_17 607 (-10) -1.62%
poseidon_bn254_hash 5,352 (-96) -1.76%
poseidon_bn254_hash_width_3 5,352 (-96) -1.76%
sha2_byte 2,763 (-51) -1.81%
databus_two_calldata 213 (-4) -1.84%
slice_dynamic_index 2,587 (-53) -2.01%
eddsa 10,468 (-219) -2.05%
regression_5252 4,548 (-106) -2.28%
poseidonsponge_x5_254 4,168 (-101) -2.37%
array_sort 295 (-9) -2.96%
6_array 395 (-14) -3.42%
array_to_slice 761 (-57) -6.97%

@vezenovm vezenovm requested a review from a team September 18, 2024 18:40
@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 18, 2024

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

@vezenovm vezenovm marked this pull request as draft September 18, 2024 18:56
@vezenovm
Copy link
Contributor Author

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

The main difference between the SSA on master and this PR looks to be the inc_rc instructions remaining in place. Before mem2reg we have this pattern:

    v61 = load v52
    inc_rc v61
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    v64 = load v51
    inc_rc v64
    v65 = load v52
    inc_rc v65
    v66 = load v52
    inc_rc v66
    v67 = load v52
    v68 = load v53
    v70 = lt v68, u32 4
    constrain v70 == u1 1 '"push out of bounds"'
    v72 = load v52
    v73 = load v53
    v74 = load v52
    v75 = load v53
    v76 = mul v75, u32 4
    v77 = array_set v72, index v76, value Field 0

The repeat loads are removed in this PR, but those follow-up inc_rc instructions remain. I think this can be handled in a follow-up though so I am marking this PR ready for review again.

Copy link
Contributor

@michaeljklein michaeljklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple cleanup notes, but otherwise LGTM

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved
@vezenovm
Copy link
Contributor Author

brillig_rc_regression_6123 +19 ❌ +10.16%
fold_2_to_17 +42 ❌ +6.89%
bench_2_to_17 +21 ❌ +6.60%
fold_numeric_generic_poseidon +51 ❌ +6.57%
no_predicates_numeric_generic_poseidon +51 ❌ +6.57%
poseidon2 +21 ❌ +6.54%

Looks like we are getting various regressions now after the RC correctness fix. Going to table this PR for now, and look at further optimizing RC instruction removals .

@vezenovm vezenovm marked this pull request as draft September 25, 2024 14:45
Copy link
Contributor

github-actions bot commented Nov 22, 2024

Changes to number of Brillig opcodes executed

Generated at commit: a58edd62463c23cbac93f4d6cc9070308902f36e, compared to commit: 7d612425728a9d5852ec26a96af629aec7cd0a21

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 +14 ❌ +4.67%
hash_to_field -32 ✅ -3.42%
hashmap -1,976 ✅ -3.48%
eddsa -27,694 ✅ -3.75%
sha2_byte -1,895 ✅ -3.79%
array_to_slice -159 ✅ -8.49%

Full diff report 👇
Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 314 (+14) +4.67%
ram_blowup_regression 778,668 (-17) -0.00%
sha256_regression 116,201 (-3) -0.00%
sha256_var_padding_regression 219,737 (-6) -0.00%
sha256_var_size_regression 16,355 (-1) -0.01%
sha256 13,869 (-1) -0.01%
array_dynamic_blackbox_input 18,251 (-2) -0.01%
sha256_var_witness_const_regression 6,751 (-1) -0.01%
brillig_cow_regression 519,007 (-96) -0.02%
sha256_brillig_performance_regression 26,358 (-16) -0.06%
array_if_cond_simple 537 (-1) -0.19%
nested_array_dynamic 3,109 (-7) -0.22%
slice_regex 3,389 (-9) -0.26%
conditional_1 5,711 (-18) -0.31%
uhashmap 147,617 (-527) -0.36%
slices 3,267 (-19) -0.58%
nested_array_in_slice 1,546 (-12) -0.77%
databus_two_calldata 445 (-4) -0.89%
slice_dynamic_index 4,639 (-42) -0.90%
tuple_inputs 632 (-6) -0.94%
u128 25,147 (-291) -1.14%
schnorr 10,226 (-128) -1.24%
7_function 2,555 (-32) -1.24%
to_be_bytes 2,448 (-33) -1.33%
poseidon2 720 (-11) -1.50%
fold_numeric_generic_poseidon 5,226 (-81) -1.53%
no_predicates_numeric_generic_poseidon 5,226 (-81) -1.53%
keccak256 35,011 (-544) -1.53%
generics 140 (-3) -2.10%
bench_2_to_17 604,551 (-13,067) -2.12%
6_array 1,638 (-36) -2.15%
fold_2_to_17 1,121,881 (-24,854) -2.17%
to_le_bytes 1,152 (-31) -2.62%
poseidonsponge_x5_254 186,395 (-6,075) -3.16%
poseidon_bn254_hash 165,536 (-5,417) -3.17%
poseidon_bn254_hash_width_3 165,536 (-5,417) -3.17%
regression_5252 928,622 (-30,508) -3.18%
array_sort 567 (-19) -3.24%
hash_to_field 905 (-32) -3.42%
hashmap 54,747 (-1,976) -3.48%
eddsa 710,924 (-27,694) -3.75%
sha2_byte 48,041 (-1,895) -3.79%
array_to_slice 1,713 (-159) -8.49%

@vezenovm vezenovm marked this pull request as ready for review November 22, 2024 09:27
@vezenovm
Copy link
Contributor Author

Following some regressions from PR #6505 (#6505 (comment)) I decided to update this PR with master. It looks like we have some benefits from the optimizations aside for brillig_rc_regression_6123. However, it looks to be a minor regression so I am marking this PR ready for review again.

@vezenovm vezenovm requested a review from a team November 22, 2024 09:32
Copy link
Member

@TomAFrench TomAFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to do a proper review but a couple of nits

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm still not 100% on the interactions with aliases here but I haven't been able to trick it with e.g. nested mutable references to create aliases, passing one of those to a function, mutating it, then loading again trying to get that load removed since only the alias was passed.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants