feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

vezenovm · 2024-09-18T18:33:29Z

Description

Problem*

Resolves

Part of general effort to improve mem2reg.

Summary*

We sometimes have situations such as the following:

            b3():
              v9 = load v0
              v10 = eq v9, Field 2
              constrain v9 == Field 2
              v11 = load v2
              v12 = load v2
              v13 = eq v12, Field 2
              constrain v11 == Field 2

v2 does not have a known value, thus we do not remove the load. The mem2reg pass is acting as expected here. However, without a store or call to the reference between v11 = load v2 and v12 = load v2 we should be able to safely remove v12 = load v2 and map v12 -> v11.

This PR adds this logic as part of the initial mem2reg pass. We have a new last_loads map as part of a Block. This is currently cleared after analyzing block and is meant to only be per block. Unifying these last loads across blocks and the accurate predecessors can come in a follow-up. This is an initial proof of concept to show the optimizations validity.

Given an instruction we act as following:

Load

Check if we have a last load from the current load address. If we do, remove the current current and map its result to the previous load result.
Add a last load for the address.

Store

Remove the last load for the address

Call

Remove the last load for any reference arguments

I have also added two unit tests to mem2reg.rs

Additional Context

Documentation*

Check one:

No documentation needed.
Documentation included in this PR.
[For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

I have tested the changes locally.
I have formatted the changes with Prettier and/or cargo fmt on default settings.

github-actions · 2024-09-18T18:36:56Z

Changes to Brillig bytecode sizes

Generated at commit: a58edd62463c23cbac93f4d6cc9070308902f36e, compared to commit: 7d612425728a9d5852ec26a96af629aec7cd0a21

🧾 Summary (10% most significant diffs)

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	+10 ❌	+5.59%
regression_5252	-106 ✅	-2.28%
poseidonsponge_x5_254	-101 ✅	-2.37%
array_sort	-9 ✅	-2.96%
6_array	-14 ✅	-3.42%
array_to_slice	-57 ✅	-6.97%

Full diff report 👇

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	189 (+10)	+5.59%
sha256	2,237 (-1)	-0.04%
sha256_regression	6,565 (-3)	-0.05%
sha256_var_size_regression	1,716 (-1)	-0.06%
sha256_brillig_performance_regression	3,239 (-2)	-0.06%
sha256_var_witness_const_regression	1,243 (-1)	-0.08%
array_dynamic_blackbox_input	1,026 (-1)	-0.10%
keccak256	3,748 (-4)	-0.11%
sha256_var_padding_regression	4,787 (-6)	-0.13%
slice_regex	2,163 (-3)	-0.14%
7_function	594 (-1)	-0.17%
bigint	1,991 (-5)	-0.25%
brillig_cow_regression	2,154 (-6)	-0.28%
ram_blowup_regression	957 (-3)	-0.31%
nested_array_dynamic	1,985 (-7)	-0.35%
conditional_1	1,188 (-6)	-0.50%
schnorr	1,414 (-8)	-0.56%
hashmap	20,376 (-149)	-0.73%
to_le_bytes	132 (-1)	-0.75%
array_if_cond_simple	131 (-1)	-0.76%
uhashmap	13,420 (-108)	-0.80%
tuple_inputs	364 (-3)	-0.82%
generics	104 (-1)	-0.95%
slices	2,013 (-24)	-1.18%
nested_array_in_slice	1,185 (-15)	-1.25%
fold_numeric_generic_poseidon	784 (-10)	-1.26%
no_predicates_numeric_generic_poseidon	784 (-10)	-1.26%
u128	2,758 (-38)	-1.36%
poseidon2	358 (-5)	-1.38%
bench_2_to_17	351 (-5)	-1.40%
to_be_bytes	208 (-3)	-1.42%
hash_to_field	136 (-2)	-1.45%
fold_2_to_17	607 (-10)	-1.62%
poseidon_bn254_hash	5,352 (-96)	-1.76%
poseidon_bn254_hash_width_3	5,352 (-96)	-1.76%
sha2_byte	2,763 (-51)	-1.81%
databus_two_calldata	213 (-4)	-1.84%
slice_dynamic_index	2,587 (-53)	-2.01%
eddsa	10,468 (-219)	-2.05%
regression_5252	4,548 (-106)	-2.28%
poseidonsponge_x5_254	4,168 (-101)	-2.37%
array_sort	295 (-9)	-2.96%
6_array	395 (-14)	-3.42%
array_to_slice	761 (-57)	-6.97%

vezenovm · 2024-09-18T18:42:23Z

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

vezenovm · 2024-09-18T19:30:05Z

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

The main difference between the SSA on master and this PR looks to be the inc_rc instructions remaining in place. Before mem2reg we have this pattern:

    v61 = load v52
    inc_rc v61
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    v64 = load v51
    inc_rc v64
    v65 = load v52
    inc_rc v65
    v66 = load v52
    inc_rc v66
    v67 = load v52
    v68 = load v53
    v70 = lt v68, u32 4
    constrain v70 == u1 1 '"push out of bounds"'
    v72 = load v52
    v73 = load v53
    v74 = load v52
    v75 = load v53
    v76 = mul v75, u32 4
    v77 = array_set v72, index v76, value Field 0

The repeat loads are removed in this PR, but those follow-up inc_rc instructions remain. I think this can be handled in a follow-up though so I am marking this PR ready for review again.

michaeljklein

A couple cleanup notes, but otherwise LGTM

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

vezenovm · 2024-09-25T13:48:02Z

brillig_rc_regression_6123 +19 ❌ +10.16%
fold_2_to_17 +42 ❌ +6.89%
bench_2_to_17 +21 ❌ +6.60%
fold_numeric_generic_poseidon +51 ❌ +6.57%
no_predicates_numeric_generic_poseidon +51 ❌ +6.57%
poseidon2 +21 ❌ +6.54%

Looks like we are getting various regressions now after the RC correctness fix. Going to table this PR for now, and look at further optimizing RC instruction removals .

github-actions · 2024-11-22T09:17:10Z

Changes to number of Brillig opcodes executed

Generated at commit: a58edd62463c23cbac93f4d6cc9070308902f36e, compared to commit: 7d612425728a9d5852ec26a96af629aec7cd0a21

🧾 Summary (10% most significant diffs)

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	+14 ❌	+4.67%
hash_to_field	-32 ✅	-3.42%
hashmap	-1,976 ✅	-3.48%
eddsa	-27,694 ✅	-3.75%
sha2_byte	-1,895 ✅	-3.79%
array_to_slice	-159 ✅	-8.49%

Full diff report 👇

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	314 (+14)	+4.67%
ram_blowup_regression	778,668 (-17)	-0.00%
sha256_regression	116,201 (-3)	-0.00%
sha256_var_padding_regression	219,737 (-6)	-0.00%
sha256_var_size_regression	16,355 (-1)	-0.01%
sha256	13,869 (-1)	-0.01%
array_dynamic_blackbox_input	18,251 (-2)	-0.01%
sha256_var_witness_const_regression	6,751 (-1)	-0.01%
brillig_cow_regression	519,007 (-96)	-0.02%
sha256_brillig_performance_regression	26,358 (-16)	-0.06%
array_if_cond_simple	537 (-1)	-0.19%
nested_array_dynamic	3,109 (-7)	-0.22%
slice_regex	3,389 (-9)	-0.26%
conditional_1	5,711 (-18)	-0.31%
uhashmap	147,617 (-527)	-0.36%
slices	3,267 (-19)	-0.58%
nested_array_in_slice	1,546 (-12)	-0.77%
databus_two_calldata	445 (-4)	-0.89%
slice_dynamic_index	4,639 (-42)	-0.90%
tuple_inputs	632 (-6)	-0.94%
u128	25,147 (-291)	-1.14%
schnorr	10,226 (-128)	-1.24%
7_function	2,555 (-32)	-1.24%
to_be_bytes	2,448 (-33)	-1.33%
poseidon2	720 (-11)	-1.50%
fold_numeric_generic_poseidon	5,226 (-81)	-1.53%
no_predicates_numeric_generic_poseidon	5,226 (-81)	-1.53%
keccak256	35,011 (-544)	-1.53%
generics	140 (-3)	-2.10%
bench_2_to_17	604,551 (-13,067)	-2.12%
6_array	1,638 (-36)	-2.15%
fold_2_to_17	1,121,881 (-24,854)	-2.17%
to_le_bytes	1,152 (-31)	-2.62%
poseidonsponge_x5_254	186,395 (-6,075)	-3.16%
poseidon_bn254_hash	165,536 (-5,417)	-3.17%
poseidon_bn254_hash_width_3	165,536 (-5,417)	-3.17%
regression_5252	928,622 (-30,508)	-3.18%
array_sort	567 (-19)	-3.24%
hash_to_field	905 (-32)	-3.42%
hashmap	54,747 (-1,976)	-3.48%
eddsa	710,924 (-27,694)	-3.75%
sha2_byte	48,041 (-1,895)	-3.79%
array_to_slice	1,713 (-159)	-8.49%

vezenovm · 2024-11-22T09:31:48Z

Following some regressions from PR #6505 (#6505 (comment)) I decided to update this PR with master. It looks like we have some benefits from the optimizations aside for brillig_rc_regression_6123. However, it looks to be a minor regression so I am marking this PR ready for review again.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

TomAFrench

still need to do a proper review but a couple of nits

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

Co-authored-by: Tom French <[email protected]>

…into mv/remove-last-loads-per-block

jfecher

LGTM. I'm still not 100% on the interactions with aliases here but I haven't been able to trick it with e.g. nested mutable references to create aliases, passing one of those to a function, mutating it, then loading again trying to get that load removed since only the alias was passed.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

Co-authored-by: jfecher <[email protected]>

handle last_loads per block and remove them if possible

90712c2

vezenovm changed the title ~~feat(perf): Track last loads per block and remove them if possible~~ feat(perf): Track last loads per block in mem2reg and remove them if possible Sep 18, 2024

vezenovm added 4 commits September 18, 2024 18:34

remove debugging ssa file

8bf6ff6

cleanup comments

12cd64e

remove test debugging things

8c077aa

Merge branch 'master' into mv/remove-last-loads-per-block

5b88acd

vezenovm requested a review from a team September 18, 2024 18:40

vezenovm marked this pull request as draft September 18, 2024 18:56

actually removei nstruction do not just map value

ccd7298

vezenovm marked this pull request as ready for review September 18, 2024 19:30

vezenovm mentioned this pull request Sep 18, 2024

feat(perf): Remove inc_rc/dec_rc instructions that follow a removed load in mem2reg #6092

Closed

5 tasks

Merge branch 'master' into mv/remove-last-loads-per-block

65a461f

michaeljklein requested changes Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs

2fccdbc

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

26f4041

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

18b35e5

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

c275357

vezenovm requested review from michaeljklein and jfecher September 19, 2024 16:51

jfecher reviewed Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

vezenovm added 3 commits September 19, 2024 21:12

fmt

171108f

cleanup

360654e

one more cleanup

0bf0525

vezenovm mentioned this pull request Sep 23, 2024

Correctness issue with reference counters #6123

Closed

vezenovm added 2 commits September 24, 2024 18:06

Merge branch 'master' into mv/remove-last-loads-per-block

02ed30e

timeout increase

b1d8305

vezenovm marked this pull request as draft September 25, 2024 14:45

vezenovm and others added 5 commits September 26, 2024 11:33

Merge branch 'master' into mv/remove-last-loads-per-block

7c3b927

Merge branch 'master' into mv/remove-last-loads-per-block

88afd21

Merge branch 'master' into mv/remove-last-loads-per-block

3d6a249

Merge branch 'master' into mv/remove-last-loads-per-block

2d6c0fa

Merge branch 'master' into mv/remove-last-loads-per-block

cba3738

vezenovm mentioned this pull request Nov 18, 2024

feat(ssa): Unroll small loops in brillig #6505

Merged

16 tasks

merge conflicts w/ master

f66b2c6

vezenovm marked this pull request as ready for review November 22, 2024 09:27

vezenovm requested a review from a team November 22, 2024 09:32

TomAFrench reviewed Nov 22, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Show resolved Hide resolved

vezenovm added 2 commits November 22, 2024 11:40

update tests to use SSA parser

cb0dd20

Merge branch 'master' into mv/remove-last-loads-per-block

9ac843a

TomAFrench reviewed Nov 22, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved

vezenovm and others added 4 commits November 22, 2024 17:10

use or_default()

65f8fea

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

c8bb5cc

Co-authored-by: Tom French <[email protected]>

Merge remote-tracking branch 'origin/mv/remove-last-loads-per-block' …

7067aff

…into mv/remove-last-loads-per-block

cleaup comments

4ef0204

vezenovm requested a review from TomAFrench November 22, 2024 17:28

jfecher approved these changes Nov 22, 2024

View reviewed changes

vezenovm and others added 4 commits November 25, 2024 10:39

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

ffafdcc

Co-authored-by: jfecher <[email protected]>

move last loads clearing

ac925cb

pr comments and cleanup

40396f5

Merge branch 'master' into mv/remove-last-loads-per-block

e18c6a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024

michaeljklein left a comment

vezenovm commented Sep 25, 2024

github-actions bot commented Nov 22, 2024 •

edited

Loading

vezenovm commented Nov 22, 2024

TomAFrench left a comment

jfecher left a comment

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

Are you sure you want to change the base?

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

Conversation

vezenovm commented Sep 18, 2024 • edited Loading

Description

Problem*

Summary*

Load

Store

Call

Additional Context

Documentation*

PR Checklist*

github-actions bot commented Sep 18, 2024 • edited Loading

Changes to Brillig bytecode sizes

🧾 Summary (10% most significant diffs)

vezenovm commented Sep 18, 2024 • edited Loading

vezenovm commented Sep 18, 2024

michaeljklein left a comment

Choose a reason for hiding this comment

vezenovm commented Sep 25, 2024

github-actions bot commented Nov 22, 2024 • edited Loading

Changes to number of Brillig opcodes executed

🧾 Summary (10% most significant diffs)

vezenovm commented Nov 22, 2024

TomAFrench left a comment

Choose a reason for hiding this comment

jfecher left a comment

Choose a reason for hiding this comment

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading