[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

vogelpi · 2024-02-22T11:12:02Z

This PR consists of several commits required to improve the masking in the KMAC block and to prevent brute-forcing attacks on the PRNG state. This PR resolves #20828.

The commits can be grouped into 3 groups:

The first commits simplify/cleanup/improve the I/O mux control and PRNG inside the masked SHA3 core. Ultimately, this allows preventing glitches on the mux control signals for the inputs and outputs of the non-linear layer (the DOM multipliers) needing the entropy. This helps improving the masking and comes basically for free in terms of area and timing.
One commit adds 800-bit buffer stage between the output of the PRNG and the masked non-linear layer / the DOM multipliers to prevent glitches occurring inside the PRNG from propagating into the masked SHA3 core. For the old PRNG architecture (limited glitching activity in output due to 4-bit S-Boxes), this helps improving the masking whereas for the new architecture this is absolutely needed. This adds around 7 kGE to the overall KMAC area (+3.8%), no timing impact.
One commit to replace the LFSR-based PRNG architecture potentially vulnerable to brute-forcing attacks with a Trivium-based implementation (similar to what we've already done for AES, see #[aes,rtl] Switch to Bivium-based masking PRNG implementation #20852) . This adds around 13 kGE to the overall KMAC area (20 kGE with the buffer stage, +11%) and reduced the max clock frequency from 617 MHz to 480 MHz (all numbers generated with the Yosys + nangate45 flow, so use with care). Due to the heavy unrolling of the primitive (we need it to generate 800 bits per clock cycle from a 288-bit state) the critical path inside KMAC is now withing the PRNG. However, the 480 MHz is still a lot faster than other parts / accelerators in the design. So, I don't think this should be a problem.

From an SCA perspective, I have the following results so far:

The PROLEAD evaluation indicates a slight improvement but the results are still indecisive (test results seem to oscillate around the threshold but the oscillation now starts later).
The FPGA measurements look a lot better now. Previously we would see leakage starting to kick in at around 1 Mio traces. Sometimes sooner, sometimes later.
- In SHA3 mode we now have: no 1st order leakage with 10 Mio traces, 2nd-order leakage only during the first 2 rounds, i.e., when the SHA3 state is not yet properly randomized, i.e., we have little noise in the linear layer. This won't be an issue for KMAC. If we want to get rid of this, we have to move the message masking to after the padding (see [kmac] Consider moving message masking to output of sha3pad #17209). This will come for free in terms of area and timing, so can be done in M3 if we really want it. Just for reference, I had to optimize the top-level design / capture setup quite a bit to get results but I am now very confident in these measurements. If masking is disabled via SW, we get strong 1st leakage with already 1000 traces (a big improvement to the past!).
- Measurements for the KMAC mode are ~~currently running.~~ are now done as well: we have no 1st and 2nd order leakage with 10 Mio traces. Again a big improvement compared to before :-)

Since a full-width PRNG is used that is able to produce 800 bits in each clock cycle, the control logic around requesting fresh randomness for remasking inside the DOM multipliers of keccak_2share can be simplified to simply request fresh randomness when it's needed in the next clock cycle. Signed-off-by: Pirmin Vogel <[email protected]>

Already before this change, keccak_round was basically responsible for the fine-grained control of the DOM multiplier input/output muxes but this wasn't obvious leading to a much more complicated design. Moving all these control signals up to keccak_round allows to simplify the code and more importantly, it paves the way for registering some critical signals to avoid glitches on the input/output muxed signals which is beneficial for SCA hardening. Signed-off-by: Pirmin Vogel <[email protected]>

By flopping these signals they are freed of glitches which is beneficial for SCA hardening. Signed-off-by: Pirmin Vogel <[email protected]>

This additional buffer stage prevents glitches occurring at the PRNG output (due to the non-linear S-Box layer) from propagating into the DOM multipliers inside the Keccak/SHA3 core. This is beneficial for SCA hardening. Signed-off-by: Pirmin Vogel <[email protected]>

Depending on the PRNG architecture and control, the externally provided randomness can be guaranteed to be stable when the inputs to the DOM multipliers don't change. Not using partial intermediate results to cover these cases allows saving some silicon area (minus 800 MUX2). However, it seems that PROLEAD currently cannot successfully analyze the design with this new option enabled. For this reason, we keep the multiplexers in the design. Signed-off-by: Pirmin Vogel <[email protected]>

This commit switches the LFSR-based PRNG with an unrolled, Trivium-based PRNG implementation to avoid brute-forcing attacks on the LFSR states. The overall PRNG state decreases from 800 bits to 288 bits but due to the heavy unrolling, the primitive can still generate 800 bits per cycle as required by the masked SHA3 core. This resolves lowRISC#20828. Signed-off-by: Pirmin Vogel <[email protected]>

nasahlpa

Thanks @vogelpi, great job - LGTM! Splitting up this PR into multiple commits and the offline discussion really helped in reviewing these code changes.

andreaskurth

LGTM, thanks for this quite extensive change and improvement @vogelpi!

(I focused my review on the RTL code and didn't notice red flags. For the changes to the other code, I think CI checks and your experiments + local tests should suffice.)

msfschaffner

Thanks for the well structured PR @vogelpi! This is great!

vogelpi · 2024-02-29T07:49:44Z

Thanks everybody for your reviews, let's merge this!

cdgori

Continuing catch-up, this all LGTM

vogelpi added 5 commits February 22, 2024 11:36

[kmac] Make DOM multiplier I/O muxing glitch free

7f6d0df

By flopping these signals they are freed of glitches which is beneficial for SCA hardening. Signed-off-by: Pirmin Vogel <[email protected]>

vogelpi requested review from a team as code owners February 22, 2024 11:12

vogelpi requested review from hcallahan-lowrisc, pamaury, andreaskurth, nasahlpa, cdgori, msfschaffner and johannheyszl and removed request for a team, hcallahan-lowrisc and pamaury February 22, 2024 11:12

vogelpi mentioned this pull request Feb 22, 2024

[kmac] Improve PRNG implementation #20828

Closed

vogelpi force-pushed the kmac-prng branch 2 times, most recently from a4495f6 to 7f6e23b Compare February 22, 2024 12:19

vogelpi force-pushed the kmac-prng branch from 7f6e23b to eadd79e Compare February 22, 2024 12:44

nasahlpa approved these changes Feb 27, 2024

View reviewed changes

andreaskurth approved these changes Feb 27, 2024

View reviewed changes

msfschaffner approved these changes Feb 27, 2024

View reviewed changes

vogelpi merged commit 642d7a9 into lowRISC:master Feb 29, 2024
32 checks passed

vogelpi mentioned this pull request Feb 29, 2024

Revert "[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation" #21775

Closed

andreaskurth mentioned this pull request Mar 12, 2024

[kmac] D2S Signoff #20978

Closed

andreaskurth mentioned this pull request Mar 13, 2024

[kmac] Bump version to 2.0.0 #21982

Merged

andreaskurth mentioned this pull request Mar 28, 2024

[kmac] V2S Signoff #21013

Closed

cdgori reviewed May 1, 2024

View reviewed changes

vogelpi deleted the kmac-prng branch May 3, 2024 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

vogelpi commented Feb 22, 2024 •

edited

Loading

nasahlpa left a comment

andreaskurth left a comment

msfschaffner left a comment •

edited

Loading

vogelpi commented Feb 29, 2024

cdgori left a comment

[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

Conversation

vogelpi commented Feb 22, 2024 • edited Loading

nasahlpa left a comment

Choose a reason for hiding this comment

andreaskurth left a comment

Choose a reason for hiding this comment

msfschaffner left a comment • edited Loading

Choose a reason for hiding this comment

vogelpi commented Feb 29, 2024

cdgori left a comment

Choose a reason for hiding this comment

vogelpi commented Feb 22, 2024 •

edited

Loading

msfschaffner left a comment •

edited

Loading