Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cryptotest] Add test coverage of HMAC streaming API #23493

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RyanTorok
Copy link
Contributor

@RyanTorok RyanTorok commented Jun 4, 2024

Adds test coverage in the cryptotest framework for the HMAC streaming API added in #23335. The testing is modeled after how the streaming coverage was done for SHA-2 in #21281.

The second commit uncomments the hash algorithm tests in the BUILD file, which were accidentally commented out in a previous commit.

See also: #23471

@RyanTorok RyanTorok requested review from a team as code owners June 4, 2024 12:20
@RyanTorok RyanTorok requested review from jwnrt, jadephilipoom and ballifatih and removed request for a team June 4, 2024 12:20
Copy link
Contributor

@jwnrt jwnrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust LGTM, thanks

sw/host/tests/crypto/hmac_kat/src/main.rs Outdated Show resolved Hide resolved
sw/device/tests/crypto/cryptotest/firmware/hmac.c Outdated Show resolved Hide resolved
@RyanTorok RyanTorok force-pushed the cryptotest-hmac-streaming branch from 5051d52 to 78eb1af Compare June 4, 2024 12:45
Copy link
Contributor

@ballifatih ballifatih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @RyanTorok.

nit: Digest/tag sizes are exposed through cryptolib-API's hash.h now, which is included in mac.h. I recently refactored some other tests to reuse these exposed enums. E.g. kOtcryptoHmacTagBytesSha256 -> kSha256DigestWords.

Because of the amount of vectors in SHA-2, I cannot get past 2h timeout on FPGA. Am I missing another timeout parameters besides the one defined below?

@RyanTorok
Copy link
Contributor Author

The test should respect the timeout that's there. If you're timing out, you can try changing long to eternal and see if it works. I don't recall it taking longer than 2 hours to run on my CW310, however.

@RyanTorok RyanTorok force-pushed the cryptotest-hmac-streaming branch from 78eb1af to c25ad2b Compare June 4, 2024 13:00
@RyanTorok
Copy link
Contributor Author

LGTM, thanks @RyanTorok.

nit: Digest/tag sizes are exposed through cryptolib-API's hash.h now, which is included in mac.h. I recently refactored some other tests to reuse these exposed enums. E.g. kOtcryptoHmacTagBytesSha256 -> kSha256DigestWords.

Because of the amount of vectors in SHA-2, I cannot get past 2h timeout on FPGA. Am I missing another timeout parameters besides the one defined below?

Refactored to use the constants in hash.h.

@ballifatih
Copy link
Contributor

ballifatih commented Jun 4, 2024

I am setting timeout="eternal", but for some reason it times out at 2h, which I found surprising. As for hmac_kat, I was able to run it without timeout. The reason is takes so long could be because previous OTBN-based implementation was faster, and the current HMAC driver needs a bit of improvement.

@jadephilipoom
Copy link
Contributor

I find the timeout very surprising; HMAC should be fast. I'm worried this means we're missing a performance issue in the C code. How many test vectors are there exactly? Is the test progressing through them at a steady rate or taking much longer on some than others?

@ballifatih
Copy link
Contributor

I find the timeout very surprising; HMAC should be fast. I'm worried this means we're missing a performance issue in the C code. How many test vectors are there exactly? Is the test progressing through them at a steady rate or taking much longer on some than others?

The test timeouts (at 2h mark) at the following test:

[2024-06-04T17:13:20.803Z INFO  harness] Test counter: 632
[2024-06-04T17:13:20.803Z INFO  harness] vendor: nist, algorithm: sha-512, test case: 117
[2024-06-04T17:13:20.803Z INFO  opentitanlib::test_utils::rpc] Sending: "Hash"
[2024-06-04T17:13:20.810Z INFO  opentitanlib::test_utils::rpc] Sending: "Sha512"
[2024-06-04T17:13:20.819Z INFO  opentitanlib::test_utils::rpc] Sending: {"length":64}

There are two immediate things to improve in the driver that I am aware of:

  • Message bytes are written byte at a time, and needs to be optimized to word-writes as much as possible
  • One-shot calls are actually broken down into 2 streaming calls, so there is a cost of storing/restoring context even for one-shot calls.

However I am not sure these alone should justify the speed. I will investigate to see why it's taking so long.

@RyanTorok
Copy link
Contributor Author

RyanTorok commented Jun 5, 2024

I find the timeout very surprising; HMAC should be fast. I'm worried this means we're missing a performance issue in the C code. How many test vectors are there exactly? Is the test progressing through them at a steady rate or taking much longer on some than others?

There's a lot of hash test vectors, and quite a few of them are intentionally very long messages that take on the order of seconds to hash on a CW310. I recall the entire job taking over an hour for me, the last time I ran the whole thing.

Regarding your concern about the performance of HMAC, recall that the hash test job also includes SHA-3 and SHAKE, which are currently software backed, I think. One practical option would be to split the job up by high-level algorithm, i.e. one job for SHA-2, one for SHA-3, and one for SHAKE.

@RyanTorok RyanTorok force-pushed the cryptotest-hmac-streaming branch from c25ad2b to 3e21972 Compare June 5, 2024 10:06
@moidx
Copy link
Contributor

moidx commented Jun 5, 2024

Regarding your concern about the performance of HMAC, recall that the hash test job also includes SHA-3 and SHAKE, which are currently software backed, I think. One practical option would be to split the job up by high-level algorithm, i.e. one job for SHA-2, one for SHA-3, and one for SHAKE.

We should make this change so that we can include the SHA2 and HMAC vectors as part of the hmac sign-off. I would prefer if there is a separate target per configuration: e.g. hash_sha2_256_kat, rather than bundling everything up. You can still have a test suite aggregating all tests as hmac_kat.

Let's get this PR in, and then we can work on the required refactoring.

@jadephilipoom
Copy link
Contributor

Regarding your concern about the performance of HMAC, recall that the hash test job also includes SHA-3 and SHAKE, which are currently software backed, I think. One practical option would be to split the job up by high-level algorithm, i.e. one job for SHA-2, one for SHA-3, and one for SHAKE.

But according to the printout, it's timing out after 632 tests. I'd expect a hardware HMAC to get through a lot more than that in 2h on an FPGA. I'm still a little worried that we're missing a performance issue here. But it shouldn't block the PR; either the issue is in the test code (in which case it's not a huge deal) or it's not being introduced here, just detected.

@ballifatih
Copy link
Contributor

But according to the printout, it's timing out after 632 tests. I'd expect a hardware HMAC to get through a lot more than that in 2h on an FPGA. I'm still a little worried that we're missing a performance issue here. But it shouldn't block the PR; either the issue is in the test code (in which case it's not a huge deal) or it's not being introduced here, just detected.

Yes, it timeouts before SHAKE/SHA3 vectors. I think the number of tests do not mean much in this case. The message length seems to be throttling the speed.

For comparison, hmac_kat passes in ~19 mins with 1422 tests. In any case, it's worth looking into. It should not block this PR.

@ballifatih
Copy link
Contributor

I ran a simple experiment that runs 100 SHA-512 operations with 4KByte message blocks, by modifying sw/device/tests/crypto/hmac_functest.c:

  • Define an array of 512-bit x_i variables for i in [0, ..., 100]. Set x_0 = 0.
  • For i in [0, ..., 99] compute:
    • x_{i+1} = SHA512(x_i || x_i || … || x_i) where the input message is 4KByte. Each x_i is 64 bytes, so the message is concatenation of 64 copies of x_i.
  • Check that x_100 matches known digest value.

I get the following time measurements on CW340.

  • When SHA-512 calls are streamed with 64 update calls:
    • The complete test ends in 8.3s.
    • Each SHA-512 computation takes “314447 cycles or 54768 us @ 24 MHz” as reported by LOG_INFO.
  • When SHA-512 calls are one-shot:
    • The complete test ends in 3.1s.
    • Each SHA-512 computation takes “29327 cycles or 1221 us @ 24 MHz.” as reported by LOG_INFO.

I suspect that the cryptotest and communicating with FPGA is the bottleneck for tests that have large message inputs.

@jadephilipoom
Copy link
Contributor

Thanks for the experiments, @ballifatih ! I agree that given your results (reasonably fast tests even with large, streamed input) it seems likely that communication with the FPGA could be the culprit in terms of performance. It'd still be good to resolve, but that's much better than the bottleneck being in the actual library 🙂

23335 added a streaming API for HMAC utilizing the HMAC HWIP. Here, we
add support in the cryptotest framework to cover the streaming API,
modeled after how this is done for the existing streaming APIs for
SHA-2.

Signed-off-by: Ryan Torok <[email protected]>
@RyanTorok RyanTorok force-pushed the cryptotest-hmac-streaming branch from 3e21972 to 87e3c1a Compare June 6, 2024 16:19
@RyanTorok
Copy link
Contributor Author

Rebased to master to resolve merge conflict with #23518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants