Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rv_core_ibex, top_earlgrey] Enable SecureIbex for CW340 #25146

Merged
merged 1 commit into from
Dec 8, 2024

Conversation

nasahlpa
Copy link
Member

@nasahlpa nasahlpa commented Nov 14, 2024

As the SecureIbex configuration is the one we use for the Earl Grey ASIC, we also should enable it for the FPGA boards.

Due to resource constraints, this is only possible for the CW340 but not the CW310.

Report for CW340 with SecureIbex enabled:
Resource utilization stays still well below 70%:
image

And timing is met:
image

Report for CW340 with SecureIbex disabled:
image
image

Closes #25137

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

% elif target["name"] == "cw310":
.KmacEnMasking(0),
.KmacSwKeyMasked(1),
.KeymgrKmacEnMasking(0),
.SecKmacCmdDelay(0),
.SecKmacIdleAcceptSwMsg(1'b0),
.RvCoreIbexSecureIbex(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably also should add that for the cw305.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, done.

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:

I get the same error on a local CW340 with the new bitstream:
Command result: SFDP header contains incorrect signature: 0xffffffff
as in CI.
I am a bit stuck here. Do you know how I can debug this?

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

I've updated the PR description with area and timing numbers before and after this PR.

@vogelpi
Copy link
Contributor

vogelpi commented Nov 15, 2024

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:

I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

@a-will
Copy link
Contributor

a-will commented Nov 15, 2024

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:
I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

That likely means bootstrapping failed at the very first step, and the chip gave no response to SPI. MISO was left high when the tester tried to read the SFDP table.

For the prior PR state and its associated CI jobs, I found no evidence any software had run on CW340's Ibex. The ROM did not seem to print any identification messages.

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:
I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

That likely means bootstrapping failed at the very first step, and the chip gave no response to SPI. MISO was left high when the tester tried to read the SFDP table.

For the prior PR state and its associated CI jobs, I found no evidence any software had run on CW340's Ibex. The ROM did not seem to print any identification messages.

Thanks @a-will. Do you know how I can debug this?

@GregAC
Copy link
Contributor

GregAC commented Nov 15, 2024

@nasahlpa what test failed? Can't find the older CI logs and still awaiting the run for the newer one.

At a guess the bootloader has immediately fallen over because we get an instant alert from Ibex, maybe because the lockstep hasn't come out of reset properly? There's some prims involved that may have FPGA specific implementations that could explain the behaviour difference.

@nasahlpa
Copy link
Member Author

@nasahlpa what test failed? Can't find the older CI logs and still awaiting the run for the newer one.

At a guess the bootloader has immediately fallen over because we get an instant alert from Ibex, maybe because the lockstep hasn't come out of reset properly? There's some prims involved that may have FPGA specific implementations that could explain the behaviour difference.

All of them. Not a single test on CW340 passed. Here are the old results from CI.

@nasahlpa
Copy link
Member Author

nasahlpa commented Nov 17, 2024

I investigated this and discovered the following:

When enabling the SecureIbex paramter for CW340, a major Ibex alert is triggered by a register file ECC check. Hence, the cores does not boot up and we cannot bootstrap software.

When either setting the RegFileECC = 1'b0 in the ibex_top module or forcing the ECC error signal to 0, the core boots up and I can successfully run SW on CW340 with the SecureIbex parameter on.

My assumption is that the ibex_register_file_fpga module does not correctly work with RegFileECC = 1'b1. I had a quick look into the FPGA register file - it though seems that we are correctly initializing the RF content with the ECC encoded 0.

In the long term, we should investigate this and fix it.

As a temporary solution, we could bypass the error:

  • Disable the RegFileECC option.
    • Requires decoupling this option from the SecureIbex parameter, i.e., changes in the Ibex repository as well as in top_earlgrey as well as chip_earlgrey_cw340 in the OpenTitan repository are needed.
  • Use the FF register file.
    • When the SecureIbex parameter is enabled for CW340, use the FF instead of the FPGA register file
    • Probably the easiest solution for now as we only need to modify chip_earlgrey_cw340 in the OpenTitan repository.

I tested both solutions and they work.

WDYT @GregAC?

Edit: ah, after discussing with @vogelpi about this I think the error is here:
https://github.com/lowRISC/ibex/blob/169785d0711335c94561a93146e069766eec138c/rtl/ibex_register_file_fpga.sv#L46
We should use here WordZeroVal instead of 0. I'll give it a try and report back.

@GregAC
Copy link
Contributor

GregAC commented Nov 18, 2024

@nasahlpa nice work on diagnosing the issue. I'd favour just using the FF register file in FPGA, assuming the FPGA implementation still works.

The reason to use the FPGA RF is efficient resource utilization I don't think switching to the FF version will have a meaningful effect given the full earl grey design is huge.

We should work out why the FPGA version is broken in this instance. I'm happy to take a look at this.

And having written that just seen the edit! If that's the quick fix then great go for it, otherwise switch to FF version would be my vote.

nasahlpa added a commit to nasahlpa/ibex that referenced this pull request Nov 18, 2024
We should use `WordZeroVal` instead of `0` for reads from register `x0` in the
FPGA register file.

This bug was discovered when enabling the `RegFileECC` parameter. When this is
enabled, the core performs ECC checks, expecting that `WordZeroVal` is returned
for `x0`. Else, we get a major alert.

Fixes lowRISC/opentitan#25146

Signed-off-by: Pascal Nasahl <[email protected]>
@nasahlpa
Copy link
Member Author

The error was indeed the zero value for x0.
This will be fixed once #2224 in the Ibex repository is merged.

Thanks all for helping me debugging this issue.

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nasahlpa nasahlpa marked this pull request as ready for review November 19, 2024 09:11
@nasahlpa nasahlpa added the CherryPick:earlgrey_1.0.0 This PR should be cherry-picked to earlgrey_1.0.0 label Nov 19, 2024
@nasahlpa nasahlpa force-pushed the secure_ibex_cw340 branch 4 times, most recently from a020f8b to 5ad1420 Compare November 19, 2024 13:19
localparam bit RegFileECC = SecureIbex;
localparam bit RegFileECC = 1'b0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be removed before merging this.

@vogelpi
Copy link
Contributor

vogelpi commented Nov 22, 2024

This got accidentally closed again by someone merging a related commit in the base repo. @nasahlpa , the reason is the "fixes #pr-no" in the commit message. I think it will happen again :-(

@nasahlpa
Copy link
Member Author

After fixing the register file, the core still does not boot up. I had a closer look and it turns out that when I exclude the rf_wdata_wb_ecc signal from the lockstep comparison, the core boots up. When adding the comparison for this signal, it fails again with the same message as above. Even when I exclude the ECC bits of this signal:

  assign outputs_mismatch =
    (enable_cmp_q != IbexMuBiOff) & ((shadow_outputs_q.crash_dump != core_outputs_q[0].crash_dump) | 
                                     (shadow_outputs_q.double_fault_seen != core_outputs_q[0].double_fault_seen) |
                                     (shadow_outputs_q.rf_raddr_a != core_outputs_q[0].rf_raddr_a) |
                                     (shadow_outputs_q.rf_raddr_b != core_outputs_q[0].rf_raddr_b) |
                                     (shadow_outputs_q.rf_waddr_wb != core_outputs_q[0].rf_waddr_wb) |
                                     (shadow_outputs_q.rf_we_wb != core_outputs_q[0].rf_we_wb) |
                                     (shadow_outputs_q.ic_tag_req != core_outputs_q[0].ic_tag_req) |
                                     (shadow_outputs_q.ic_tag_write != core_outputs_q[0].ic_tag_write) |
                                     (shadow_outputs_q.ic_tag_addr != core_outputs_q[0].ic_tag_addr) |
                                     (shadow_outputs_q.ic_tag_wdata != core_outputs_q[0].ic_tag_wdata) |
                                     (shadow_outputs_q.ic_data_req != core_outputs_q[0].ic_data_req) |
                                     (shadow_outputs_q.ic_data_write != core_outputs_q[0].ic_data_write) |
                                     (shadow_outputs_q.ic_data_addr != core_outputs_q[0].ic_data_addr) |
                                     (shadow_outputs_q.ic_data_wdata != core_outputs_q[0].ic_data_wdata) |
                                     (shadow_outputs_q.ic_scr_key_req != core_outputs_q[0].ic_scr_key_req) |
                                     (shadow_outputs_q.instr_req != core_outputs_q[0].instr_req) |
                                     (shadow_outputs_q.instr_addr != core_outputs_q[0].instr_addr) |
                                     (shadow_outputs_q.data_req != core_outputs_q[0].data_req) |
                                     (shadow_outputs_q.data_we != core_outputs_q[0].data_we) |
                                     (shadow_outputs_q.data_be != core_outputs_q[0].data_be) |
                                     (shadow_outputs_q.data_addr != core_outputs_q[0].data_addr) |
                                     (shadow_outputs_q.data_wdata != core_outputs_q[0].data_wdata) |
                                     (shadow_outputs_q.dummy_instr_id != core_outputs_q[0].dummy_instr_id) |
                                     (shadow_outputs_q.dummy_instr_wb != core_outputs_q[0].dummy_instr_wb) |
                                     (shadow_outputs_q.irq_pending != core_outputs_q[0].irq_pending) |
                                     (shadow_outputs_q.rf_wdata_wb_ecc[31:0] != core_outputs_q[0].rf_wdata_wb_ecc[31:0]));

the core does not boot up. I've tried the following:

  • Disabling RegFileECC, then rf_wdata_wb_ecc_o = rf_wdata_wb
  • Disabling the WriteBack stage as rf_wdata_wb is driven inside the ibex_wb_stage module

But still no success.

I also had a look whether all signals are properly initialized on a reset, this looks good. However, as rf_wdata_wb directly gets data over the LSU from memory, it could be that some memory is not properly initialized? Strangely the lockstep comparison error for this signal only fails after I splice the bitstream to add the OTP and ROM memory. When I just create a bitstream with:
./bazelisk.sh build //hw/bitstream/vivado:fpga_cw340
this bitstream works.

Before setting up the Vivado ILA debugger, maybe @vogelpi and @GregAC could have again a look?

As the `SecureIbex` configuration is the one we use for the
Earl Grey ASIC, we also should enable it for the FPGA boards.

Due to resource constraints, this is only possible for the CW340
but not the CW310.

Resource utilization stays still will below 70%.

Closes lowRISC#25137

Signed-off-by: Pascal Nasahl <[email protected]>
@nasahlpa nasahlpa force-pushed the secure_ibex_cw340 branch 2 times, most recently from 21deb87 to ce82d80 Compare December 8, 2024 14:34
@nasahlpa
Copy link
Member Author

nasahlpa commented Dec 8, 2024

Issue is resolved (lowRISC/ibex#2228) and CW340 with enabled SecureIbex configuration now successfully builds & runs test. Hence, I am merging this now

@nasahlpa nasahlpa merged commit 71630d1 into lowRISC:master Dec 8, 2024
37 checks passed
@nasahlpa nasahlpa deleted the secure_ibex_cw340 branch December 8, 2024 17:26
nasahlpa added a commit to nasahlpa/opentitan that referenced this pull request Dec 8, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
lowRISC#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
nasahlpa added a commit to nasahlpa/opentitan that referenced this pull request Dec 8, 2024
By enabling the SecureIbex parameter for CW340 (see
lowRISC#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
Copy link

github-actions bot commented Dec 8, 2024

Successfully created backport PR for earlgrey_1.0.0:

nasahlpa added a commit to nasahlpa/opentitan that referenced this pull request Dec 8, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
lowRISC#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
nasahlpa added a commit to nasahlpa/opentitan that referenced this pull request Dec 8, 2024
By enabling the SecureIbex parameter for CW340 (see
lowRISC#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
engdoreis pushed a commit that referenced this pull request Dec 9, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
engdoreis pushed a commit that referenced this pull request Dec 9, 2024
By enabling the SecureIbex parameter for CW340 (see
#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
github-actions bot pushed a commit that referenced this pull request Dec 9, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
(cherry picked from commit 4c04dfd)
github-actions bot pushed a commit that referenced this pull request Dec 9, 2024
By enabling the SecureIbex parameter for CW340 (see
#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
(cherry picked from commit 92439bb)
jwnrt pushed a commit that referenced this pull request Dec 10, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
(cherry picked from commit 4c04dfd)
jwnrt pushed a commit that referenced this pull request Dec 10, 2024
By enabling the SecureIbex parameter for CW340 (see
#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
(cherry picked from commit 92439bb)
Razer6 pushed a commit to Razer6/opentitan that referenced this pull request Dec 10, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
lowRISC#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
Razer6 pushed a commit to Razer6/opentitan that referenced this pull request Dec 10, 2024
By enabling the SecureIbex parameter for CW340 (see
lowRISC#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
Razer6 pushed a commit to Razer6/opentitan that referenced this pull request Dec 12, 2024
After enabling the SecureIbex parameter on CW340 (c.f.,
lowRISC#25146], the otbn_mem_scramble_test now
also can be executed on the CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
Razer6 pushed a commit to Razer6/opentitan that referenced this pull request Dec 12, 2024
By enabling the SecureIbex parameter for CW340 (see
lowRISC#25146), TLUL ECC errors now trigger an interrupt.
This commit modifies the scram_ctrl_scrambled_access test such
that the test checks whether we get the interrupt on CW340.

Signed-off-by: Pascal Nasahl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPick:earlgrey_1.0.0 This PR should be cherry-picked to earlgrey_1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[rv_core_ibex] Enable SecureIbex config for FPGA
4 participants