Skip to content

Commit

Permalink
[aes,rtl] Switch to Bivium-based masking PRNG implementation
Browse files Browse the repository at this point in the history
While the LFSR-based masking PRNG seems to be able to produce randomness
of sufficient quality for the masking implementation in the AES cipher
core (as indicated both by our FPGA-based side-channel analysis and by
the PROLEAD tool) it may be vulnerable to brute-forcing attacks on
the LFSR states.

Therefore, this commit replaces the LFSR-based masking PRNG
implementation inside AES with an implementation based on an unrolled
Bivium stream cipher primitive as suggested in the paper:

Cassiers, "Randomness Generation for Secure Hardware Masking - Unrolled
Trivium to the Rescue" available at https://eprint.iacr.org/2023/1134

Thanks to bigger individual state chunks - the smallest non-linear
feedback shift register (NFSR) has a width of 84 bits - brute-forcing
attacks are rendered infeasible. The overall PRNG state width increases
from 160 to 177 bits, meaning the randomness consumption for
reseeding increases by one 32-bit word per reseed. Thanks to the strong
diffusion properties of Bivium, fresh entropy received from EDN can
still be injected directly into the PRNG without additional buffering.
From an area perspective, the overall AES area increases slightly
(+1.81 kGE or +1.1% based on Yosys + nangate45).

Thanks again @cassiersg and @AeinRezaeiShahmirzadi for reporting the
issue in the first place and for the constructive conversations.

This is related to #19091.

Signed-off-by: Pirmin Vogel <[email protected]>
  • Loading branch information
vogelpi committed Jan 23, 2024
1 parent c70c257 commit f6ccabc
Show file tree
Hide file tree
Showing 16 changed files with 319 additions and 383 deletions.
2 changes: 1 addition & 1 deletion hw/ip/aes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The AES unit supports the following features:
- Support for AES-192 can be removed to save area, and is enabled/disabled using a compile-time Verilog parameter
- First-order masking of the cipher core using domain-oriented masking (DOM) to deter side-channel analysis (SCA), can optionally be disabled using compile-time Verilog parameters (for more details see [Security Hardening](./doc/theory_of_operation.md#side-channel-analysis))
- Latency per 16 byte data block of 12/14/16 clock cycles (unmasked implementation) and 56/66/72 clock cycles (DOM) in AES-128/192/256 mode
- Automatic as well as software-initiated reseeding of internal pseudo-random number generators (PRNGs) with configurable reseeding rate resulting in max entropy consumption rates ranging from 286 Mbit/s to 0.035 Mbit/s (at 100 MHz).
- Automatic as well as software-initiated reseeding of internal pseudo-random number generators (PRNGs) with configurable reseeding rate resulting in max entropy consumption rates ranging from 343 Mbit/s to 0.042 Mbit/s (at 100 MHz).
- Countermeasures for deterring fault injection (FI) on the control path (for more details see [Security Hardening](./doc/theory_of_operation.md#fault-injection))
- Register-based data and control interface
- System key-manager interface for optional key sideload to not expose key material to the processor and other hosts attached to the system bus interconnect.
Expand Down
1 change: 1 addition & 0 deletions hw/ip/aes/aes.core
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ filesets:
- lowrisc:prim:lc_sync
- lowrisc:prim:lfsr
- lowrisc:prim:sparse_fsm
- lowrisc:prim:trivium
- lowrisc:prim:util
- lowrisc:ip:tlul
- lowrisc:ip:lc_ctrl_pkg
Expand Down
8 changes: 4 additions & 4 deletions hw/ip/aes/data/aes.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
desc: '''
Default seed of the PRNG used for masking.
'''
randcount: "160",
randcount: "288",
randtype: "data"
},
{ name: "RndCnstMaskingLfsrPerm",
Expand Down Expand Up @@ -731,21 +731,21 @@
desc: '''
3'b001: Reseed the masking PRNG once per block.
Invalid input values, i.e., values with multiple bits set and value 3'b000 are mapped to PER_1 (3'b001).
This results in a max entropy consumption rate of ~286 Mbit/s.
This results in a max entropy consumption rate of ~343 Mbit/s.
'''
},
{ value: "2",
name: "PER_64",
desc: '''
3'b010: Reseed the masking PRNG approximately once per every 64 blocks.
This results in a max entropy consumption rate of ~4.5 Mbit/s.
This results in a max entropy consumption rate of ~5.4 Mbit/s.
'''
},
{ value: "4",
name: "PER_8K",
desc: '''
3'b100: Reseed the masking PRNG approximately once per every 8192 blocks.
This results in an max entropy consumption rate of ~0.035 Mbit/s.
This results in a max entropy consumption rate of ~0.042 Mbit/s.
'''
}
]
Expand Down
6 changes: 3 additions & 3 deletions hw/ip/aes/doc/registers.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,9 +287,9 @@ Invalid input values, i.e., values with multiple bits set and value 3'b000 are m

| Value | Name | Description |
|:--------|:-------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x1 | PER_1 | 3'b001: Reseed the masking PRNG once per block. Invalid input values, i.e., values with multiple bits set and value 3'b000 are mapped to PER_1 (3'b001). This results in a max entropy consumption rate of ~286 Mbit/s. |
| 0x2 | PER_64 | 3'b010: Reseed the masking PRNG approximately once per every 64 blocks. This results in a max entropy consumption rate of ~4.5 Mbit/s. |
| 0x4 | PER_8K | 3'b100: Reseed the masking PRNG approximately once per every 8192 blocks. This results in an max entropy consumption rate of ~0.035 Mbit/s. |
| 0x1 | PER_1 | 3'b001: Reseed the masking PRNG once per block. Invalid input values, i.e., values with multiple bits set and value 3'b000 are mapped to PER_1 (3'b001). This results in a max entropy consumption rate of ~343 Mbit/s. |
| 0x2 | PER_64 | 3'b010: Reseed the masking PRNG approximately once per every 64 blocks. This results in a max entropy consumption rate of ~5.4 Mbit/s. |
| 0x4 | PER_8K | 3'b100: Reseed the masking PRNG approximately once per every 8192 blocks. This results in a max entropy consumption rate of ~0.042 Mbit/s. |

Other values are reserved.

Expand Down
2 changes: 1 addition & 1 deletion hw/ip/aes/doc/theory_of_operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ If the AES unit is busy and running in CBC or CTR mode, the AES unit itself upda

The cipher core architecture of the AES unit is derived from the architecture proposed by Satoh et al.: ["A compact Rijndael Hardware Architecture with S-Box Optimization"](https://link.springer.com/chapter/10.1007%2F3-540-45682-1_15).
The expected circuit area in a 110nm CMOS technology is in the order of 12 - 22 kGE (unmasked implementation, AES-128 only).
The expected circuit area of the entire AES unit with masking enabled is around 110 kGE.
The expected circuit area of the entire AES unit with masking enabled is around 112 kGE.

For a description of the various sub modules, see the following sections.

Expand Down
16 changes: 3 additions & 13 deletions hw/ip/aes/dv/sva/aes_bind.sv
Original file line number Diff line number Diff line change
Expand Up @@ -52,28 +52,18 @@ if (`EN_MASKING) begin : gen_prng_bind
.rst_ni,

.entropy_masking_req(entropy_masking_req),
.entropy_masking_ack(entropy_masking_ack),

.entropy_i(edn_data),
.lfsr_q_0 (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.gen_lfsrs[0].u_lfsr_chunk.
lfsr_q),
.lfsr_q_1 (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.gen_lfsrs[1].u_lfsr_chunk.
lfsr_q),
.lfsr_q_2 (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.gen_lfsrs[2].u_lfsr_chunk.
lfsr_q),
.lfsr_q_3 (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.gen_lfsrs[3].u_lfsr_chunk.
lfsr_q),
.lfsr_q_4 (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.gen_lfsrs[4].u_lfsr_chunk.
lfsr_q),
.state_q (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.u_prim_bivium.state_q),

.reseed_rate (u_aes_core.prng_reseed_rate_q),
.block_ctr_expr (u_aes_core.u_aes_control.gen_fsm[0].gen_fsm_p.u_aes_control_fsm_i.
u_aes_control_fsm.block_ctr_expr),
.ctrl_state (u_aes_core.u_aes_control.gen_fsm[0].gen_fsm_p.u_aes_control_fsm_i.
u_aes_control_fsm.aes_ctrl_cs),
.ctrl_state_next(u_aes_core.u_aes_control.gen_fsm[0].gen_fsm_p.u_aes_control_fsm_i.
u_aes_control_fsm.aes_ctrl_ns),
.alert_fatal (u_aes_core.alert_fatal_o),
.seed_en (u_aes_core.u_aes_cipher_core.gen_masks.u_aes_prng_masking.prng_seed_en)
.alert_fatal (u_aes_core.alert_fatal_o)
);
end
endmodule
51 changes: 28 additions & 23 deletions hw/ip/aes/dv/sva/aes_masking_reseed_if.sv
Original file line number Diff line number Diff line change
Expand Up @@ -12,40 +12,45 @@ interface aes_masking_reseed_if
import aes_pkg::*;
import aes_reg_pkg::*;
#(
parameter int unsigned EntropyWidth = edn_pkg::ENDPOINT_BUS_WIDTH,
parameter int unsigned Width = WidthPRDMasking, // Must be divisble by ChunkSize and 8
parameter int unsigned ChunkSize = ChunkSizePRDMasking, // Width of the LFSR primitives
localparam int unsigned NumChunks = Width/ChunkSize // derived parameter
parameter int unsigned EntropyWidth = edn_pkg::ENDPOINT_BUS_WIDTH,
parameter int unsigned StateWidth = prim_trivium_pkg::BiviumStateWidth
) (
input logic clk_i,
input logic rst_ni,

// Entropy request signal
// Entropy request/ack signals
input logic entropy_masking_req,
input logic entropy_masking_ack,

// Entropy input and LFSR state signals
// Entropy input and PRNG state signals
input logic [EntropyWidth-1:0] entropy_i,
input logic [ChunkSize-1:0] lfsr_q_0,
input logic [ChunkSize-1:0] lfsr_q_1,
input logic [ChunkSize-1:0] lfsr_q_2,
input logic [ChunkSize-1:0] lfsr_q_3,
input logic [ChunkSize-1:0] lfsr_q_4,
input logic [StateWidth-1:0] state_q,

// Control signals
input prs_rate_e reseed_rate,
input logic block_ctr_expr,
input aes_ctrl_e ctrl_state,
input aes_ctrl_e ctrl_state_next,
input logic alert_fatal,
input logic [NumChunks-1:0] seed_en
input logic block_ctr_expr,
input aes_ctrl_e ctrl_state,
input aes_ctrl_e ctrl_state_next,
input logic alert_fatal
);

// Make sure the LFSRs of the masking PRNG are set to the correct values obtained from EDN.
`ASSERT(MaskingPrngState0MatchesEdnInput_A, seed_en[0] |-> ##1 entropy_i == lfsr_q_0)
`ASSERT(MaskingPrngState1MatchesEdnInput_A, seed_en[1] |-> ##1 entropy_i == lfsr_q_1)
`ASSERT(MaskingPrngState2MatchesEdnInput_A, seed_en[2] |-> ##1 entropy_i == lfsr_q_2)
`ASSERT(MaskingPrngState3MatchesEdnInput_A, seed_en[3] |-> ##1 entropy_i == lfsr_q_3)
`ASSERT(MaskingPrngState4MatchesEdnInput_A, seed_en[4] |-> ##1 entropy_i == lfsr_q_4)
localparam int unsigned LastStatePartFractional = StateWidth % EntropyWidth != 0 ? 1 : 0;
localparam int unsigned NumStateParts = StateWidth / EntropyWidth + LastStatePartFractional;
localparam int unsigned NumBitsLastPart = StateWidth - (NumStateParts - 1) * EntropyWidth;
localparam int unsigned LastStatePart = NumStateParts - 1;

logic [NumStateParts-1:0] state_part_matches_input;
always_comb begin
state_part_matches_input = '0;
for (int unsigned i = 0; i < LastStatePart; i++) begin
state_part_matches_input[i] = state_q[i * EntropyWidth +: EntropyWidth] == entropy_i;
end
state_part_matches_input[LastStatePart] =
state_q[StateWidth - 1 -: NumBitsLastPart] == entropy_i[NumBitsLastPart-1:0];
end

// Make sure the entropy input obtained from EDN actually ends up in one part of the PRNG state.
`ASSERT(MaskingPrngStatePartMatchesEdnInput_A, entropy_masking_req && entropy_masking_ack
|-> ##1 |state_part_matches_input)

// Make sure the masking PRNG is reseeded when a new block is started while the block counter
// has expired unless a fatal alert is triggered.
Expand Down
32 changes: 26 additions & 6 deletions hw/ip/aes/pre_sca/prolead/aes_cipher_core_config.set
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ no_of_initial_inputs

% number of clock cycles to initiate the run (start of encryption)
no_of_initial_clock_cycles
11
12

%1 - First clock cycle with inactive reset.
key_clear_i 1'b0
Expand Down Expand Up @@ -272,7 +272,27 @@ no_of_initial_clock_cycles
prng_reseed_i 1'b1
[2:0] key_len_i 3'b001

%10 - Start encryption in parallel with a reseed of the internal masking PRNG.
%10 - Wait for initial reseed of the masking PRNG to finish.
key_clear_i 1'b0
data_out_clear_i 1'b0
alert_fatal_i 1'b0
force_masks_i 1'b0
entropy_ack_i 1'b1
[2:0] out_ready_i 3'b011
cfg_valid_i 1'b1
[1:0] op_i 2'b01
[127:0] state_init_i group_in0[127:0]
[255:128] state_init_i group_in1[127:0]
[127:0] prd_clearing_i group_in0[255:128]
[511:0] key_init_i 512'h00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
rst_ni 1'b1
[2:0] in_valid_i 3'b100
[2:0] crypt_i 3'b100
[2:0] dec_key_gen_i 3'b100
prng_reseed_i 1'b1
[2:0] key_len_i 3'b001

%11 - Start encryption.
key_clear_i 1'b0
data_out_clear_i 1'b0
alert_fatal_i 1'b0
Expand All @@ -292,7 +312,7 @@ no_of_initial_clock_cycles
prng_reseed_i 1'b1
[2:0] key_len_i 3'b001

%11 - De-assert in_valid_i. The DUT is already busy performing the encryption.
%12 - De-assert in_valid_i. The DUT is already busy performing the encryption.
% De-asserting in_valid_i helps to avoid restarting the encryption after finishing in case the
% simulation isn't stopped.
key_clear_i 1'b0
Expand All @@ -319,15 +339,15 @@ no_of_initial_clock_cycles
% Note: end_wait_cycles > 0 doesn't seem to work with signal values, otherwise we could use
% something like [2:0] out_valid_o 3'b011
end_condition
ClockCycles 66
ClockCycles 67

% number of clock cycles to wait after the end_condition
end_wait_cycles
0

% maximum number of clock cycles per run before checking the end_condition
max_clock_cycle
66
67

no_of_outputs
0
Expand All @@ -336,7 +356,7 @@ no_of_outputs
no_of_test_clock_cycles
1

9-66 % The encryption starts at %10 and takes 56 clock cycles.
10-67 % The encryption starts at %11 and takes 56 clock cycles.

% max number of entries in the report file with maximum leakage
% 0 : do not generate the report file
Expand Down
1 change: 1 addition & 0 deletions hw/ip/aes/pre_syn/syn_yosys.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ OT_DEP_SOURCES=(
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_lc_sync.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_sync_reqack_data.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_sync_reqack.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_trivium.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_packer_fifo.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_lfsr.sv
"$LR_SYNTH_SRC_DIR"/../prim/rtl/prim_flop_2sync.sv
Expand Down
1 change: 0 additions & 1 deletion hw/ip/aes/rtl/aes_cipher_core.sv
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,6 @@ module aes_cipher_core import aes_pkg::*;
// - the PRD required by the key expand module (has 4 S-Boxes internally).
aes_prng_masking #(
.Width ( WidthPRDMasking ),
.ChunkSize ( ChunkSizePRDMasking ),
.EntropyWidth ( EntropyWidth ),
.SecAllowForcingMasks ( SecAllowForcingMasks ),
.SecSkipPRNGReseeding ( SecSkipPRNGReseeding ),
Expand Down
18 changes: 11 additions & 7 deletions hw/ip/aes/rtl/aes_pkg.sv
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ parameter int unsigned WidthPRDData = 16*WidthPRDSBox; // 16 S-Boxes for the
parameter int unsigned WidthPRDKey = 4*WidthPRDSBox; // 4 S-Boxes for the key expand
parameter int unsigned WidthPRDMasking = WidthPRDData + WidthPRDKey;

parameter int unsigned ChunkSizePRDMasking = WidthPRDMasking/5;

// Clearing PRNG default LFSR seed and permutation
// These LFSR parameters have been generated with
// $ util/design/gen-lfsr-seed.py --width 64 --seed 31468618 --prefix "Clearing"
Expand All @@ -51,22 +49,28 @@ parameter clearing_lfsr_perm_t RndCnstClearingSharePermDefault = {
256'h9736b95ac3f3b5205caf8dc536aad73605d393c8dd94476e830e97891d4828d0
};

// Masking PRNG default LFSR seed and permutation
// We use a single seed that is split down into chunks internally.
// Masking PRNG default state seed and output permutation
// The output width is 160 bits (WidthPRDMasking = WidthPRDSBox * (16 + 4)).
// These LFSR parameters have been generated with
// $ util/design/gen-lfsr-seed.py --width 160 --seed 31468618 --prefix "Masking"
parameter int MaskingLfsrWidth = 160; // = WidthPRDMasking = WidthPRDSBox * (16 + 4)
typedef logic [MaskingLfsrWidth-1:0] masking_lfsr_seed_t;
typedef logic [MaskingLfsrWidth-1:0][$clog2(MaskingLfsrWidth)-1:0] masking_lfsr_perm_t;
parameter masking_lfsr_seed_t RndCnstMaskingLfsrSeedDefault =
160'hc132b5723c5a4cf4743b3c7c32d580f74f1713a;
parameter masking_lfsr_perm_t RndCnstMaskingLfsrPermDefault = {
256'h17261943423e4c5c03872194050c7e5f8497081d96666d406f4b606473303469,
256'h8e7c721c8832471f59919e0b128f067b25622768462e554d8970815d490d7f44,
256'h048c867d907a239b20220f6c79071a852d76485452189f14091b1e744e396737,
256'h4f785b772b352f6550613c58130a8b104a3f28019c9a380233956b00563a512c,
256'h808d419d63982a16995e0e3b57826a36718a9329452492533d83115a75316e15
};
// The state width is 177 bits (Bivium) but the primitive expects a 288-bit seed (Trivium).
// These LFSR parameters have been generated with
// $ util/design/gen-lfsr-seed.py --width 288 --seed 31468618 --prefix "Masking"
parameter int MaskingPrngStateWidth = 288;
typedef logic [MaskingPrngStateWidth-1:0] masking_lfsr_seed_t;
parameter masking_lfsr_seed_t RndCnstMaskingLfsrSeedDefault = {
32'h758a4420,
256'h31e1c461_6ea343ec_153282a3_0c132b57_23c5a4cf_4743b3c7_c32d580f_74f1713a
};

typedef enum integer {
SBoxImplLut, // Unmasked LUT-based S-Box
Expand Down
Loading

0 comments on commit f6ccabc

Please sign in to comment.