Debug aes-fvsr-key-batch timing issues #83

vogelpi · 2022-06-20T16:21:15Z

PR #82 added the corresponding target binaries to support the aes-fvsr-key-batch command. For most of us this seems to work just fine but I saw lots of errors like:

Fixed key: b'811e3731b0120a7842781e22b25cddf9'
Connected to ChipWhisperer (num_samples: 1200, num_samples_actual: 1200, num_segments_actual: 1)
Capturing:  16%|███▋                   | 162/1000 [00:00<00:01, 492.97 traces/s]
Traceback (most recent call last):
  File "./capture.py", line 771, in <module>
    app()
 ...
  File "./capture.py", line 502, in aes_fvsr_key_batch
    capture_aes_fvsr_key_batch(
  File "./capture.py", line 460, in capture_aes_fvsr_key_batch
    check_ciphertext(ot, expected_last_ciphertext, 4)
  File "./capture.py", line 221, in check_ciphertext
    assert actual_last_ciphertext == expected_last_ciphertext[0:ciphertext_len], (
AssertionError: Incorrect encryption result!
actual: bytearray(b'^\xdc\xe8{')
expected: [ 85 154 226   3  30  12  31 147  54 197 151  35   4 134  80  98]

In words, there are sometimes mismatches between the expected and received ciphertexts. Sometimes, the failure occurs for the first batch, sometimes for a later batch, sometimes it doesn't occur at all. The failures seem to depend on timing (adding some sleep command on the Python side seem to help), Husky firmware (the latest firmware seems to be more affected), and maybe also USB connection setup (docking station, hub, laptop directly).

We should root cause and fix the problem. Otherwise we can't reliably do long running captures. Imagine we collect 10 Mio traces and get such a failure after 2 hours - all traces will be lost.

The text was updated successfully, but these errors were encountered:

abdullahvarici · 2022-08-17T14:18:20Z

I have investigated the issue and I guess I found the root cause.

We have two consecutive serial write command in 'capture_aes_fvsr_key_batch()' function in 'capture.py' script:

ot.target.simpleserial_write("s", capture_cfg["batch_prng_seed"].to_bytes(4, "little"))
ot.target.simpleserial_write("t", key_fixed)

In the device firmware 'aes_serial.c', we are polling incoming packets and call the relevant function:

while (true) {
    simple_serial_process_packet();
}

After starting the first operation, in this example seeding the prng, the second data packet may be sent before the first operation is completed and because of that it may be missed by the firmware. (Actually I would expect there would be some kind of buffer to hold serial data packets but it seems there aren't any.)

We can simply avoid this issue by placing delays between 'simpleserial_write()' functions. We are doing that unintentionly by waiting the result of the operation after starting it by sending an encrypt request.

However, the communication between the computer and device is not entirely deterministic and because of that putting a delay may not be a reliable solution. Maybe it would be better if we wait for a success message from the device after seeding PRNG, setting key etc. but not after starting an encryption because we are waiting the result anyway. It may decrease performance if we also wait for a success message after initializing an encryption request.

What do you think?

colinoflynn · 2022-08-19T19:50:16Z

simpleserial has an "ack" command that helps with this, and can be used to delay the protocol while a long-running process happens.

Checking the code I think the status is only sent on failure. By default the low-level simpleserial_write don't wait for the ack to return, only simpleserial_read will do that.

In the original CW you can see the status return here.

If you try inserting a status print after this line it may work there:

simple_serial_send_status(0);

Then to force this to wait:

ot.target.simpleserial_write("s", capture_cfg["batch_prng_seed"].to_bytes(4, "little"))
ot.target.simpleserial_wait_ack()
ot.target.simpleserial_write("t", key_fixed)

I would do similar things on any other write (you can see this is what we do for example on this key write function).

I haven't been building for OT for a bit so sorry I can't test quickly here, but will try to validate. However to avoid someone chasing this down when I think the issue is relatively simple.

abdullahvarici · 2022-08-22T10:37:14Z

Thanks @colinoflynn! I will try this and create a PR.

m-temp · 2022-11-02T12:32:12Z

Although I had no problems with the kmac-fvsr-key-batch, I ran into similar problems while developing the sha3-fvsr-data-batch.

However, the behavior was a little bit different and more un(?)predictable. It always failed on my machine, but worked on @wettermo's machine. Further we needed the ot.target.simpleserial_wait_ack() after the ot.target.simpleserial_write( "b", scope.num_segments_actual.to_bytes(4, "little") )

Does it make sense to add an ot.target.simpleserial_wait_ack() after each and every command that does not provide an immediate response? I can have a look on that after finishing the sha3-fvsr tests.

abdullahvarici · 2022-11-02T13:25:19Z

In my opinion, adding ot.target.simpleserial_wait_ack() would make it more stable but also the capture rate will drop.

m-temp · 2022-11-07T13:28:30Z

Can we estimate the impact on the capture rate?
Personally, I'd favor to acknowledge as many commands as possible, as long as the impact on the capture rate is acceptable.
See also the discussion in lowRISC/opentitan#15929 (comment)

abdullahvarici · 2022-11-07T13:59:15Z

I am not sure if we can estimate it. Maybe we can try and see :) I agree with you. We may not have a problem when we didn't add ack after every command but it does not mean we won't, I guess.

vrozic · 2023-10-19T14:17:25Z

I'm working on this issue.
First experiments show that there is no observable decrease of the capture rate after acknowledging all write commands.
Once this is complete, I will update the scripts and the binaries.

EDIT:
Source for generating aes binaries: here

johannheyszl · 2023-10-19T16:18:35Z

Interestingly, I haven't yet observed any such issues while batch capturing on husky and waverunner using the scripts here: #194
But I do check full ciphertext after every batch and all texts are AES encryptions of previous one going back to first one that is sent to OT.

johannheyszl · 2023-10-23T12:58:46Z

Since we will replace the communication layer for post-silicon to unify with the other post-silicon SiVal testing, we may not want to spend much effort in fixing this? :)

Generally I am fully in favor of more ack's and more robust communication even if sacrificing some speed. We need correct trace-data / text pairs

vogelpi mentioned this issue Apr 17, 2023

[aes] Fix aes_fvsr_key_batch on CW305 #137

Merged

vrozic self-assigned this Oct 19, 2023

johannheyszl added the Priority:P2 label Oct 23, 2023

johannheyszl mentioned this issue Oct 23, 2023

aes_serial does not send a response for the first check version command #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug aes-fvsr-key-batch timing issues #83

Debug aes-fvsr-key-batch timing issues #83

vogelpi commented Jun 20, 2022

abdullahvarici commented Aug 17, 2022 •

edited

Loading

colinoflynn commented Aug 19, 2022

abdullahvarici commented Aug 22, 2022

m-temp commented Nov 2, 2022

abdullahvarici commented Nov 2, 2022

m-temp commented Nov 7, 2022

abdullahvarici commented Nov 7, 2022

vrozic commented Oct 19, 2023 •

edited

Loading

johannheyszl commented Oct 19, 2023

johannheyszl commented Oct 23, 2023

Debug aes-fvsr-key-batch timing issues #83

Debug aes-fvsr-key-batch timing issues #83

Comments

vogelpi commented Jun 20, 2022

abdullahvarici commented Aug 17, 2022 • edited Loading

colinoflynn commented Aug 19, 2022

abdullahvarici commented Aug 22, 2022

m-temp commented Nov 2, 2022

abdullahvarici commented Nov 2, 2022

m-temp commented Nov 7, 2022

abdullahvarici commented Nov 7, 2022

vrozic commented Oct 19, 2023 • edited Loading

johannheyszl commented Oct 19, 2023

johannheyszl commented Oct 23, 2023

abdullahvarici commented Aug 17, 2022 •

edited

Loading

vrozic commented Oct 19, 2023 •

edited

Loading