Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness issue with reference counters #6123

Closed
sirasistant opened this issue Sep 23, 2024 · 7 comments · Fixed by #6134
Closed

Correctness issue with reference counters #6123

sirasistant opened this issue Sep 23, 2024 · 7 comments · Fixed by #6134
Labels
bug Something isn't working

Comments

@sirasistant
Copy link
Contributor

sirasistant commented Sep 23, 2024

Aim

Run the following program succesfully (It's a minimal reproduction of a failing test in the protocol circuits tests):

struct Builder {
    note_hashes: BoundedVec<Field, 2>,
    nullifiers: BoundedVec<Field, 2>,
}

impl Builder {
    fn append_note_hashes_with_logs(&mut self, num_note_hashes: u32) {
        let index_offset = self.note_hashes.len();
        for i in 0..self.note_hashes.max_len() {
            if i < num_note_hashes {
                self.add_new_note_hash((index_offset + i) as Field);
            }
        }
    }

    fn add_new_note_hash(&mut self, value: Field) {
        self.note_hashes.push(value);
    }
}

fn swap_items<T, let N: u32>(vec: &mut BoundedVec<T, N>, from_index: u32, to_index: u32) {
    let tmp = vec.storage[from_index];
    vec.storage[from_index] = vec.storage[to_index];
    vec.storage[to_index] = tmp;
}

unconstrained fn main() {
    let mut builder = Builder { note_hashes: BoundedVec::new(), nullifiers: BoundedVec::new() };

    builder.append_note_hashes_with_logs(2);
    builder.nullifiers.storage[1] = 27;
    // Get ordered items before shuffling.
    let note_hashes = builder.note_hashes.storage;
    let original_first_note_hash = note_hashes[0];
    // Shuffle.
    swap_items(&mut builder.note_hashes, 1, 0);

    for i in 0..1 {
        assert_eq(note_hashes[i], original_first_note_hash);
    }
}

Expected Behavior

The program should run succesfully

Bug

note_hashes is mutated when swapping items on builder.note_hashes.
This is the SSA generated:

After Array Set Optimizations:
brillig fn main f0 {
  b0():
    inc_rc v3=[v2:Field=0, v2:Field=0]
    v4 = allocate
    store v3=[v2:Field=0, v2:Field=0] at v4
    v5 = allocate
    store v6:u32=0 at v5
    inc_rc v7=[v2:Field=0, v2:Field=0]
    v8 = allocate
    store v7=[v2:Field=0, v2:Field=0] at v8
    jmp b1(v6:u32=0)
  b1(v0: u32):
    v10 = lt v0, v9:u32=2
    jmpif v10 then: b6, else: b2
  b6():
    v23 = lt v0, v9:u32=2
    jmpif v23 then: b7, else: b8
  b7():
    v24 = cast v0 as Field
    v25 = load v5
    v26 = lt v25, v9:u32=2
    constrain v26 == v27:u1=1 '"push out of bounds"'
    v28 = load v4
    v29 = load v5
    v30 = load v5
    v31 = array_set v28, index v30, value v24
    v32 = add v29, v13:u32=1
    store v31 at v4
    store v32 at v5
    dec_rc v31
    dec_rc v31
    jmp b8()
  b8():
    v33 = add v0, v13:u32=1
    jmp b1(v33)
  b2():
    v11 = load v4
    v12 = load v8
    v15 = array_set v12, index v13:u32=1, value v14:Field=27
    store v15 at v8
    inc_rc v11
    v16 = array_get v11, index v6:u32=0
    inc_rc v11
    v17 = array_get v11, index v13:u32=1
    v18 = array_set v11, index v13:u32=1, value v16
    v19 = array_set v18, index v6:u32=0, value v17
    store v19 at v4
    dec_rc v19
    jmp b3(v6:u32=0)
  b3(v1: u32):
    v20 = eq v1, v6:u32=0
    jmpif v20 then: b5, else: b4
  b5():
    v21 = array_get v11, index v1
    constrain v21 == v16
    v22 = add v1, v13:u32=1
    jmp b3(v22)
  b4():
    return 
}

By "executing" the SSA by hand, I think the SSA generated is incorrect:

b0:
 v3 = [0; 2], RC=2
 v4 = reference => v3
 v5 = reference => 0
 v7 = [0; 2], RC=2
 v8 = reference => v7
b1:
 v0 = 0
 v9 = 2
b6: 
b7:
 v24 = 0
 v25 = 0
 v26 = true
 v28 = v3
 v29 = 0
 v30 = 0
 v31 = [0; 2], RC=1 // CLONE, prev_rc = 2
 v32 = 1
 v4 = reference => v31
 v5 = reference => 1
 v31 = [0; 2], RC=0
 v31 = [0; 2], RC=MAX_VALUE // WRAPAROUND
b8:
b1:
 v0 = 1
 v9 = 2
b6: 
b7:
 v24 = 1
 v25 = 1
 v26 = true
 v28 = v31
 v29 = 1
 v30 = 1
 v31 = [0, 1], RC=1 // CLONE prev_rc = MAX_VALUE
 v32 = 2
 v4 = reference => v31
 v5 = reference => 2
 v31 = [0, 1], RC=0
 v31 = [0, 1], RC=MAX_VALUE // WRAPAROUND
b8:
b1:
 v0 = 2
b2:
 v11 = [0, 1], RC=MAX_VALUE
 v12 = v7
 v15 = [0, 27], RC=1 // CLONE prev_rc = 2
 v8 = REFERENCE => v15
 v11 = [0, 1], RC=0
 v16 = 0
 v11 = [0, 1], RC=1
 v17 = 1
 v18 = [0, 0], RC=1 // MUTATE v11, since it has RC=1
 v11 = [0, 0], RC=1
 v19 = [1, 0], RC=1 // MUTATE v11, since it has RC=1
 v11 = [1, 0], RC=1
 v4 = REFERENCE => v19
 v19 = [1, 0], RC=0
b3:
b5:
 v21 = 1
 1 != 0 ERROR (v11 was incorrectly mutated in b2)

The SSA generated is outputting enough dec_rc to wraparound rc, and in general I think the reference counter management in this SSA doesn't make much sense to me. In the end the program fails because note_hashes (v11) has been mutated inside swap_items.
Bisecting the compiler, the program stopped working after this PR:
#4560
But simply deactivating this pass does not fix it anymore.

To Reproduce

No response

Workaround

None

Workaround Description

No response

Additional Context

No response

Project Impact

None

Blocker Context

No response

Nargo Version

No response

NoirJS Version

No response

Proving Backend Tooling & Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

@sirasistant sirasistant added the bug Something isn't working label Sep 23, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Noir Sep 23, 2024
@vezenovm
Copy link
Contributor

The code snippet passes if we disable removing RC instructions during DIE. e.g. doing so gives us the following SSA:

After Array Set Optimizations:
brillig fn main f0 {
  b0():
    inc_rc v3=[v2:Field 0, v2:Field 0]
    inc_rc v3=[v2:Field 0, v2:Field 0]
    inc_rc v4=[v2:Field 0, v2:Field 0]
    inc_rc v4=[v2:Field 0, v2:Field 0]
    inc_rc v5=[v2:Field 0, v2:Field 0]
    v6 = allocate
    store v5=[v2:Field 0, v2:Field 0] at v6
    v7 = allocate
    store v8:u32 0 at v7
    inc_rc v9=[v2:Field 0, v2:Field 0]
    v10 = allocate
    store v9=[v2:Field 0, v2:Field 0] at v10
    inc_rc v11=[v2:Field 0, v2:Field 0]
    inc_rc v12=[v2:Field 0, v2:Field 0]
    jmp b1(v8:u32 0)
  b1(v0: u32):
    v14 = lt v0, v13:u32 2
    jmpif v14 then: b6, else: b2
  b6():
    v29 = lt v0, v13:u32 2
    jmpif v29 then: b7, else: b8
  b7():
    v30 = cast v0 as Field
    v31 = load v6
    inc_rc v31
    v32 = load v10
    inc_rc v32
    v33 = load v6
    inc_rc v33
    v34 = load v7
    v35 = lt v34, v13:u32 2
    constrain v35 == v36:u1 1 '"push out of bounds"'
    v37 = load v6
    v38 = load v7
    v39 = load v7
    v40 = array_set v37, index v39, value v30
    v41 = add v38, v19:u32 1
    store v40 at v6
    store v41 at v7
    dec_rc v40
    dec_rc v40
    v42 = load v10
    dec_rc v42
    jmp b8()
  b8():
    v43 = add v0, v19:u32 1
    jmp b1(v43)
  b2():
    v15 = load v6
    dec_rc v15
    v16 = load v10
    dec_rc v16
    v17 = load v6
    v18 = load v10
    v21 = array_set v18, index v19:u32 1, value v20:Field 27
    store v21 at v10
    inc_rc v17
    v22 = array_get v17, index v8:u32 0
    inc_rc v17
    v23 = array_get v17, index v19:u32 1
    v24 = array_set v17, index v19:u32 1, value v22
    v25 = array_set v24, index v8:u32 0, value v23
    store v25 at v6
    dec_rc v25
    jmp b3(v8:u32 0)
  b3(v1: u32):
    v26 = eq v1, v8:u32 0
    jmpif v26 then: b5, else: b4
  b5():
    v27 = array_get v17, index v1
    constrain v27 == v22
    v28 = add v1, v19:u32 1
    jmp b3(v28)
  b4():
    return 
}

We can see the largest difference is that we have two extra dec_rc after the first two loads in b2 and in b7 we have two extra inc_rc v31 and inc_rc v33 following loads from v6. These match the dec rc instructions at the end of the block after the store to v6.

This PR #6088 looks to actually also cause this program to execute successfully. It ends with the following SSA:

After Array Set Optimizations:
brillig fn main f0 {
  b0():
    inc_rc [Field 0, Field 0]
    v4 = allocate
    store [Field 0, Field 0] at v4
    v5 = allocate
    store u32 0 at v5
    inc_rc [Field 0, Field 0]
    v8 = allocate
    store [Field 0, Field 0] at v8
    jmp b1(u32 0)
  b1(v0: u32):
    v10 = lt v0, u32 2
    jmpif v10 then: b6, else: b2
  b6():
    v23 = lt v0, u32 2
    jmpif v23 then: b7, else: b8
  b7():
    v24 = cast v0 as Field
    v25 = load v4
    inc_rc v25
    inc_rc v25
    v26 = load v5
    v27 = lt v26, u32 2
    constrain v27 == u1 1 '"push out of bounds"'
    v29 = array_set v25, index v26, value v24
    v30 = add v26, u32 1
    store v29 at v4
    store v30 at v5
    dec_rc v29
    dec_rc v29
    jmp b8()
  b8():
    v31 = add v0, u32 1
    jmp b1(v31)
  b2():
    v11 = load v4
    dec_rc v11
    v12 = load v8
    dec_rc v12
    v15 = array_set v12, index u32 1, value Field 27
    store v15 at v8
    inc_rc v11
    v16 = array_get v11, index u32 0
    inc_rc v11
    v17 = array_get v11, index u32 1
    v18 = array_set v11, index u32 1, value v16
    v19 = array_set v18, index u32 0, value v17
    store v19 at v4
    dec_rc v19
    jmp b3(u32 0)
  b3(v1: u32):
    v20 = eq v1, u32 0
    jmpif v20 then: b5, else: b4
  b5():
    v21 = array_get v11, index v1
    constrain v21 == v16
    v22 = add v1, u32 1
    jmp b3(v22)
  b4():
    return 
}

We can see it maintains the same necessary inc_rc and dec_rc instructions as not removing inc/dec rc instructions in the DIE pass just without an extra bytecode size blow-up. Not entirely sure the reason #6088 also fixes the issue but it does provide some more insight into the issue. Per #6088, the solution was causing some extra inc_rc instructions to remain that we were previously removing. It looks like in this case, #6088 actually helped keep some necessary ones around.

@vezenovm
Copy link
Contributor

vezenovm commented Sep 23, 2024

Also noting that I do not think the rc optimization should be causing any problems here. It looks to be operating correctly. The only changes from its inputs to the outputted SSA is the following:

acir(inline) fn new f1 {
  b0():
    inc_rc [Field 0, Field 0]
    return [Field 0, Field 0], u32 0
}
acir(inline) fn len f4 {
  b0(v0: [Field; 2], v1: u32):
    inc_rc v0
    dec_rc v0
    return v1
}

After Removing Paired rc_inc & rc_decs:

acir(inline) fn new f1 {
  b0():
    inc_rc [Field 0, Field 0]
    return [Field 0, Field 0], u32 0
}
acir(inline) fn len f4 {
  b0(v0: [Field; 2], v1: u32):
    return v1
}

@vezenovm
Copy link
Contributor

Not entirely sure the reason #6088 also fixes the issue but it does provide some more insight into the issue. Per #6088, the solution was causing some extra inc_rc instructions to remain that we were previously removing. It looks like in this case, #6088 actually helped keep some necessary ones around.

Turning off inlining for this case it becomes more clear why #6088 solves this correctness issue for inc_rc.

Before DIE we have the following function:

brillig fn swap_items f3 {
  b0(v0: &mut [Field; 2], v1: &mut u32, v2: u32, v3: u32):
    v4 = load v0
    inc_rc v4
    v5 = load v0
    v6 = array_get v5, index v2
    v7 = load v0
    v8 = load v1
    v9 = load v0
    v10 = array_get v9, index v3
    v11 = array_set v7, index v2, value v10
    v13 = add v2, u32 1
    store v11 at v0
    store v8 at v1
    v14 = load v0
    v15 = load v1
    v16 = array_set v14, index v3, value v6
    v17 = add v3, u32 1
    store v16 at v0
    store v15 at v1
    v18 = load v0
    dec_rc v18
    return 
}

inc_rc v4 is laid down on the parameter as expected from increment_parameter_rcs during the initial SSA gen. However, we will see that v4 is unused, but we still need that inc_rc. v5, v7, and v9, are the loads of v0 that end up being used. DIE will remove inc_rc v4 as the load which added the parameter rcs is unused, while the later loads are used.

After DIE we get the following SSA where the inc_rc will be removed:

brillig fn swap_items f3 {
  b0(v0: &mut [Field; 2], v1: &mut u32, v2: u32, v3: u32):
    v4 = load v0
    v5 = array_get v4, index v2
    v6 = load v0
    v7 = load v1
    v8 = load v0
    v9 = array_get v8, index v3
    v10 = array_set v6, index v2, value v9
    store v10 at v0
    store v7 at v1
    v11 = load v0
    v12 = load v1
    v13 = array_set v11, index v3, value v5
    store v13 at v0
    store v12 at v1
    return 
}

#6088 fixed this case by merging the repeat loads that are used in later instructions, into the first parameter rcs load and increment. Thus I see a few ways forward (all values reference the SSA before DIE):

  1. We need to add more increment rc instructions. e.g. whenever we are loading an array from a reference parameter. This would prevent that values such as v5 and v7 are not left without inc_rc instructions. However, this will most likely cause its own blowup as we may now be adding a lot of repeat inc_rc instructions. It also feels like it should be unnecessary as we should only be adding inc_rc instructions when constructing an array.
  2. We go with feat(perf): Track last loads per block in mem2reg and remove them if possible #6088 as a solution for this issue. The issue here is that we only remove repeated loads when the reference is not passed into another call or used in a store. If we had a store after the inc_rc v4 the v0 loads would not simplify so that the inc_rc v4 could remain. However, I'm not sure if this case would ever occur as an array set would require a load before the array set, and a function call where v0 is used as an argument would be inlined. I think once we stop inlining the majority of Brillig functions, having a call after inc_rc v4 but would break using feat(perf): Track last loads per block in mem2reg and remove them if possible #6088 as a generalized solution. I think this could be mitigated by just to adding some check during DIE to not remove inc_rc instructions which were loaded from reference parameters. Although, I am not sure how inc_rc would survive generally when not inlining Brillig functions at all.
  3. We should not remove any RC instructions during DIE. This would lead to blow-ups similar to option (1).

@sirasistant
Copy link
Contributor Author

sirasistant commented Sep 24, 2024

I see, so IIUC the issue stems from the fact that we are codegening redundant work in ssa gen but it's incomplete (we are loading v0 multiple times but only inc_rc the first load). Note that any solution that we choose should work both when inlining and when not inlining (because in the future the inliner won't be boolean, we'd have some configuration and heuristics to wether inline a call or not)

My gut feeling is that we should codegen a correct program (thus going with option 1 and whenever we load an array from a reference, increasing its rc, and decreasing its rc when it goes out of scope) but then optimize any redundant work (removing the redundant loads with their associated inc_rc and dec_rc) but I might be missing something

@sirasistant
Copy link
Contributor Author

So for example:

unconstrained fn main(mut arr: [Field; 2], from_index: u32, to_index: u32) {
    let tmp = arr[from_index];
    arr[from_index] = arr[to_index];
    arr[to_index] = tmp;
}

We codegen this:

Initial SSA:
brillig fn main f0 {
  b0(v0: [Field; 2], v1: u32, v2: u32):
    v3 = allocate
    store v0 at v3
    inc_rc v0
    v4 = load v3
    v5 = array_get v4, index v1
    v6 = load v3
    v7 = load v3
    v8 = array_get v7, index v2
    v9 = array_set v6, index v1, value v8
    v11 = add v1, u32 1
    store v9 at v3
    v12 = load v3
    v13 = array_set v12, index v2, value v5
    v14 = add v2, u32 1
    store v13 at v3
    dec_rc v0
    return 
}

Which I think is incorrect, v0 has a total of 5 aliases and its RC is two. What I'd expect is:

Initial SSA:
brillig fn main f0 {
  b0(v0: [Field; 2], v1: u32, v2: u32):
    inc_rc v0
    v3 = allocate
    store v0 at v3
    v4 = load v3
    inc_rc v4
    v5 = array_get v4, index v1
    v6 = load v3
    inc_rc v6
    v7 = load v3
    inc_rc v7
    v8 = array_get v7, index v2
    v9 = array_set v6, index v1, value v8
    v11 = add v1, u32 1
    store v9 at v3
    v12 = load v3
    inc_rc v12
    v13 = array_set v12, index v2, value v5
    v14 = add v2, u32 1
    store v13 at v3
    dec_rc v12
    dec_rc v7
    dec_rc v6
    dec_rc v4
    dec_rc v0
    return 
}

And then, when a later optimization pass removes the 4 redundant loads that it'd remove the associated inc_rc and dec_rc, leaving

brillig fn main f0 {
  b0(v0: [Field; 2], v1: u32, v2: u32):
    inc_rc v0
    v3 = array_get v0, index v1
    v4 = array_get v0, index v2
    v5 = array_set mut v0, index v1, value v4
    v6 = array_set mut v5, index v2, value v3
    dec_rc v0
    return 
}

@sirasistant
Copy link
Contributor Author

sirasistant commented Sep 24, 2024

And then we could also remove the inc_rc and dec_rc pairs for array arguments to the entrypoint (only in the entrypoint, no callees) but I think that's separate

@vezenovm
Copy link
Contributor

I see, so IIUC the issue stems from the fact that we are codegening redundant work in ssa gen but it's incomplete (we are loading v0 multiple times but only inc_rc the first load)

Yes partially. It is more that due to the redundant work the initial inc_rc which we place on the parameter we want to stay. Simply restricting DIE to check whether the inc_rc comes from a block parameter load will not work post inlining.

Note that any solution that we choose should work both when inlining and when not inlining (because in the future the inliner won't be boolean, we'd have some configuration and heuristics to wether inline a call or not)

Yes makes sense. I want to verify whether #6088 can work with some extra checks in DIE on block params. Otherwise, it does look like option (1) and eating a regression for now until we can optimize further may be our best choice.

github-merge-queue bot pushed a commit that referenced this issue Sep 24, 2024
# Description

## Problem\*

Resolves #6123 

## Summary\*

Still a draft as I want to test it a bit more and am not fully sure if
it is the best solution.

## Additional Context



## Documentation\*

Check one:
- [X] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [X] I have tested the changes locally.
- [X] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants