Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kmac] Rework masked SHA3 core and switch to Trivium-based PRNG implementation #21624

Merged
merged 6 commits into from
Feb 29, 2024

Conversation

vogelpi
Copy link
Contributor

@vogelpi vogelpi commented Feb 22, 2024

This PR consists of several commits required to improve the masking in the KMAC block and to prevent brute-forcing attacks on the PRNG state. This PR resolves #20828.

The commits can be grouped into 3 groups:

  • The first commits simplify/cleanup/improve the I/O mux control and PRNG inside the masked SHA3 core. Ultimately, this allows preventing glitches on the mux control signals for the inputs and outputs of the non-linear layer (the DOM multipliers) needing the entropy. This helps improving the masking and comes basically for free in terms of area and timing.
  • One commit adds 800-bit buffer stage between the output of the PRNG and the masked non-linear layer / the DOM multipliers to prevent glitches occurring inside the PRNG from propagating into the masked SHA3 core. For the old PRNG architecture (limited glitching activity in output due to 4-bit S-Boxes), this helps improving the masking whereas for the new architecture this is absolutely needed. This adds around 7 kGE to the overall KMAC area (+3.8%), no timing impact.
  • One commit to replace the LFSR-based PRNG architecture potentially vulnerable to brute-forcing attacks with a Trivium-based implementation (similar to what we've already done for AES, see #[aes,rtl] Switch to Bivium-based masking PRNG implementation #20852) . This adds around 13 kGE to the overall KMAC area (20 kGE with the buffer stage, +11%) and reduced the max clock frequency from 617 MHz to 480 MHz (all numbers generated with the Yosys + nangate45 flow, so use with care). Due to the heavy unrolling of the primitive (we need it to generate 800 bits per clock cycle from a 288-bit state) the critical path inside KMAC is now withing the PRNG. However, the 480 MHz is still a lot faster than other parts / accelerators in the design. So, I don't think this should be a problem.

From an SCA perspective, I have the following results so far:

  • The PROLEAD evaluation indicates a slight improvement but the results are still indecisive (test results seem to oscillate around the threshold but the oscillation now starts later).
  • The FPGA measurements look a lot better now. Previously we would see leakage starting to kick in at around 1 Mio traces. Sometimes sooner, sometimes later.
    • In SHA3 mode we now have: no 1st order leakage with 10 Mio traces, 2nd-order leakage only during the first 2 rounds, i.e., when the SHA3 state is not yet properly randomized, i.e., we have little noise in the linear layer. This won't be an issue for KMAC. If we want to get rid of this, we have to move the message masking to after the padding (see [kmac] Consider moving message masking to output of sha3pad #17209). This will come for free in terms of area and timing, so can be done in M3 if we really want it. Just for reference, I had to optimize the top-level design / capture setup quite a bit to get results but I am now very confident in these measurements. If masking is disabled via SW, we get strong 1st leakage with already 1000 traces (a big improvement to the past!).
    • Measurements for the KMAC mode are currently running. are now done as well: we have no 1st and 2nd order leakage with 10 Mio traces. Again a big improvement compared to before :-)

Since a full-width PRNG is used that is able to produce 800 bits in each
clock cycle, the control logic around requesting fresh randomness for
remasking inside the DOM multipliers of keccak_2share can be simplified
to simply request fresh randomness when it's needed in the next clock
cycle.

Signed-off-by: Pirmin Vogel <[email protected]>
Already before this change, keccak_round was basically responsible for
the fine-grained control of the DOM multiplier input/output muxes but
this wasn't obvious leading to a much more complicated design.

Moving all these control signals up to keccak_round allows to simplify
the code and more importantly, it paves the way for registering some
critical signals to avoid glitches on the input/output muxed signals
which is beneficial for SCA hardening.

Signed-off-by: Pirmin Vogel <[email protected]>
By flopping these signals they are freed of glitches which is
beneficial for SCA hardening.

Signed-off-by: Pirmin Vogel <[email protected]>
This additional buffer stage prevents glitches occurring at the PRNG
output (due to the non-linear S-Box layer) from propagating into
the DOM multipliers inside the Keccak/SHA3 core. This is beneficial
for SCA hardening.

Signed-off-by: Pirmin Vogel <[email protected]>
Depending on the PRNG architecture and control, the externally provided
randomness can be guaranteed to be stable when the inputs to the DOM
multipliers don't change. Not using partial intermediate results to
cover these cases allows saving some silicon area (minus 800 MUX2).

However, it seems that PROLEAD currently cannot successfully analyze
the design with this new option enabled. For this reason, we keep the
multiplexers in the design.

Signed-off-by: Pirmin Vogel <[email protected]>
@vogelpi vogelpi requested review from a team as code owners February 22, 2024 11:12
@vogelpi vogelpi force-pushed the kmac-prng branch 2 times, most recently from a4495f6 to 7f6e23b Compare February 22, 2024 12:19
This commit switches the LFSR-based PRNG with an unrolled, Trivium-based
PRNG implementation to avoid brute-forcing attacks on the LFSR states.

The overall PRNG state decreases from 800 bits to 288 bits but due
to the heavy unrolling, the primitive can still generate 800 bits per
cycle as required by the masked SHA3 core.

This resolves lowRISC#20828.

Signed-off-by: Pirmin Vogel <[email protected]>
Copy link
Member

@nasahlpa nasahlpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vogelpi, great job - LGTM! Splitting up this PR into multiple commits and the offline discussion really helped in reviewing these code changes.

Copy link
Contributor

@andreaskurth andreaskurth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this quite extensive change and improvement @vogelpi!

(I focused my review on the RTL code and didn't notice red flags. For the changes to the other code, I think CI checks and your experiments + local tests should suffice.)

Copy link
Contributor

@msfschaffner msfschaffner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well structured PR @vogelpi! This is great!

@vogelpi
Copy link
Contributor Author

vogelpi commented Feb 29, 2024

Thanks everybody for your reviews, let's merge this!

@vogelpi vogelpi merged commit 642d7a9 into lowRISC:master Feb 29, 2024
32 checks passed
Copy link

@cdgori cdgori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing catch-up, this all LGTM

@vogelpi vogelpi deleted the kmac-prng branch May 3, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[kmac] Improve PRNG implementation
5 participants