Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otbn/rsa] Add SCA hardening for sel_sqr_or_sqrmul #25300

Closed
wants to merge 1 commit into from

Conversation

aewag
Copy link
Contributor

@aewag aewag commented Nov 21, 2024

As I cannot add reviewers I link you @jadephilipoom @johannheyszl @vogelpi

This commit hardens two sources of leakage

  • DMEM writeback: The selected temporary result, either sqr or sqrmul depending on the current exponent bit, is written to the DMEM. At this DMEM address the sqr result is stored. In the case of selecting the sqr result the HD of the selected result and the DMEM value is zero and in the other case a high value. This allows to distinguish exponent bits. The hardening overwrites the value at the DMEM address with a random value.
  • SEL instruction: BN.SEL selects one of the two source WDRs based on the carry flag. The selection depends on the current exponent bit. This allows to distinguish exponent bits. The hardening randomizes the WDRs which contain the sqr or sqrmul results. The randomization is performed for each limb.

EDIT:
Grep into OTBN ISA SIM log to show functionality

000000ec | csrrs          x23, urnd, x0                  | [x23 = 0xb078d8e1, otbn.INSN_CNT = 0x000554cb]
000000f0 | andi           x23, x23, 1                    | [x23 = 0x00000001, otbn.INSN_CNT = 0x000554cc]
000000f4 | xor            x22, x22, x23                  | [x22 = 0x00000003, otbn.INSN_CNT = 0x000554cd]
000000f8 | xor            x9, x9, x23                    | [x09 = 0x00000002, otbn.INSN_CNT = 0x000554ce]
000000fc | xor            x11, x11, x23                  | [x11 = 0x00000003, otbn.INSN_CNT = 0x000554cf]
--
000000ec | csrrs          x23, urnd, x0                  | [x23 = 0xcb607848, otbn.INSN_CNT = 0x00055761]
000000f0 | andi           x23, x23, 1                    | [x23 = 0x00000000, otbn.INSN_CNT = 0x00055762]
000000f4 | xor            x22, x22, x23                  | [x22 = 0x00000003, otbn.INSN_CNT = 0x00055763]
000000f8 | xor            x9, x9, x23                    | [x09 = 0x00000003, otbn.INSN_CNT = 0x00055764]
000000fc | xor            x11, x11, x23                  | [x11 = 0x00000002, otbn.INSN_CNT = 0x00055765]
--
000000ec | csrrs          x23, urnd, x0                  | [x23 = 0x69d25f37, otbn.INSN_CNT = 0x0005576b]
000000f0 | andi           x23, x23, 1                    | [x23 = 0x00000001, otbn.INSN_CNT = 0x0005576c]
000000f4 | xor            x22, x22, x23                  | [x22 = 0x00000002, otbn.INSN_CNT = 0x0005576d]
000000f8 | xor            x9, x9, x23                    | [x09 = 0x00000002, otbn.INSN_CNT = 0x0005576e]
000000fc | xor            x11, x11, x23                  | [x11 = 0x00000003, otbn.INSN_CNT = 0x0005576f]

This commit hardens two sources of leakage

- DMEM writeback: The selected temporary result, either sqr or sqrmul
depending on the current exponent bit, is written to the DMEM. At this
DMEM address the sqr result is stored. In the case of selecting
the sqr result the HD of the selected result and the DMEM value is zero
and in the other case a high value. This allows to distinguish
exponent bits. The hardening overwrites the value at the DMEM address
with a random value.
- SEL instruction: BN.SEL selects one of the two source WDRs based on
the carry flag. The selection depends on the current exponent bit. This
allows to distinguish exponent bits. The hardening randomizes the WDRs
which contain the sqr or sqrmul results. The randomization is performed
for each limb.

Signed-off-by: Johann Heyszl <[email protected]>
Signed-off-by: Alexander Wagner <[email protected]>
@johannheyszl
Copy link
Contributor

@aewag and I worked on this fix. It effectively prevents a leak that was uncovered in pentesting. Alex also tested that it maintains correct function.

More thoughts welcome @jadephilipoom

@johannheyszl johannheyszl self-requested a review November 22, 2024 07:15
Copy link
Contributor

@johannheyszl johannheyszl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

co-developed and approve

Copy link
Member

@nasahlpa nasahlpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for hardening this!

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this and the clear commit messages and comments in the code! Nice work!

Before merging this we should also get @jadephilipoom 's feedback I think.

@jadephilipoom
Copy link
Contributor

jadephilipoom commented Nov 22, 2024

Thanks for the contribution. I have a few questions/concerns:

  • This routine is on the hot path for RSA modexp. Would you mind following the benchmark procedure in https://opentitan.org/book/hw/ip/otbn/doc/otbn_intro.html#benchmark-reproduction, for the RSA sign/decrypt tests in particular, to see if there is a significant performance impact? If so it might be worth trying to think about how to reduce the number of instructions a little bit.
  • My understanding is that bn.sel leaks if the destination register is equal to one of the operands, or possibly if the destination register previously holds a predictable value. But it seems like you're saying bn.sel always leaks the flag it selects on and it shouldn't be used on secret flags. Do you have any reason to believe this is the case? This would have wide-ranging impact on other OTBN code. (The first leakage source you listed, the DMEM one, is clear; I'm only surprised by the bn.sel one, which also seems to be the source of most of the new code.)
  • The RSA code on OTBN where this subroutine appears is currently not SCA hardened, pretty much at all. It seems a bit weird to me to make a small hardening improvement on code that's totally vulnerable otherwise and missing large-scale algorithmic hardening improvements. Maybe some context would help me understand -- is there a specific motivation for this change?

@jadephilipoom
Copy link
Contributor

Just to be more precise about what I mean asking about bn.sel: based on our current coding guidelines, I'd expect a version that uses bn.sel but has some dummy instructions to clear internal logic not to leak. Something like this:

sel_sqr_or_sqrmul:
  /* iterate over all limbs */
  loop      x30, 8
    /* load limb from dmem */
    bn.lid    x11, 0(x21)

    /* load limb from regfile buffer */
    bn.movr   x9, x8++

    /* randomize dmem with value from URND (one extra dummy call to clear) */
    bn.addi   w31, w31, 0 /* dummy */
    bn.wsrr   w0, URND
    bn.sid    x0, 0(x21)

    /* select a limb and store to dmem */
    bn.sel    w0, w2, w3, FG0.C
    bn.sid    x0, 0(x21++)

    /* dummy call to clear */
    bn.addi   w31, w31, 0

  ret

If this is just a matter of stylistic preference and it seemed nicer to use the indirect registers, that's fine -- just trying to understand the rationale from the PR description about the the select instruction.

@johannheyszl
Copy link
Contributor

johannheyszl commented Dec 6, 2024

@jadephilipoom is there a reason you switch x9 vs. x11 when loading the two select candidates? Otherwise let's try this:

sel_sqr_or_sqrmul:
  /* iterate over all limbs */
  loop      x30, 8
    /* load limb from dmem */
    bn.lid    x9, 0(x21)

    /* load limb from regfile buffer */
    bn.movr   x11, x8++

    /* randomize dmem with value from URND (one extra dummy call to clear) */
    bn.addi   w31, w31, 0 /* dummy */
    bn.wsrr   w0, URND
    bn.sid    x0, 0(x21)

    /* select a limb and store to dmem */
    bn.sel    w0, w2, w3, FG0.C
    bn.sid    x0, 0(x21++)

    /* dummy call to clear */
    bn.addi   w31, w31, 0

  ret

@jadephilipoom
Copy link
Contributor

@jadephilipoom is there a reason you switch x9 vs. x11 when loading the two select candidates? Otherwise let's try this:

Nope, looks like a bug on my part. Thanks for the catch!

@jadephilipoom
Copy link
Contributor

jadephilipoom commented Dec 11, 2024

Johann let me know over slack that this snippet still produced incorrect answers, so I debugged it properly and can verify that this version passes the full rsa_1024_dec_test, so I have high confidence this time (the bug was that the bn.addis zero FG0.C):

  /* iterate over all limbs */
  loop      x30, 8
    /* load limb from dmem */
    bn.lid    x9, 0(x21)

    /* load limb from regfile buffer */
    bn.movr   x11, x8++

    /* randomize dmem with value from URND (one extra dummy call to clear) */
    bn.addi   w31, w31, 0, FG1 /* dummy */
    bn.wsrr   w0, URND 
    bn.sid    x0, 0(x21)

    /* select a limb and store to dmem */
    bn.sel    w0, w2, w3, FG0.C
    bn.sid    x0, 0(x21++)

    /* dummy call to clear */
    bn.addi   w31, w31, 0, FG1

  ret

@johannheyszl
Copy link
Contributor

nice, yes this makes a lot of sense hehe. I stared at the code quite long but understand how I missed it :)

@johannheyszl
Copy link
Contributor

This has been addressed through #25694 and can be closed.

@johannheyszl
Copy link
Contributor

Thanks @jadephilipoom for the collaboration and offline chats along the way - I see #25694 is now merged, which basically applies our public coding guidelines to bring this code up to speed and it passes testing. Great example for OT project collab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants