-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs #362
Conversation
Hi Nicolas. |
I was wondering what was the best way to eventually specify the fast track, I think @jjscheel mentioned that some fast tracks were specified as patch / diff over existing specifications. |
My suggestion (which could very well be a bad idea) would be to create a new directory under /riscv-crypto/doc for this extension. That would match what we have done for scalar, vector, and vector-all rounds. The Zve32 extension should be able to stand on its own. Thus, for example, it could be implemented to just add 32-bit vclmul* vector instructions without the need for any of the Vector Crypto Extensions. Or it could implemented to augment Zvbc. It might only find use in Zve32 implementations, or it might be used to add 32-bit clmul to, for example, V implementations. The Zvkgs extensions, on the other hand, only make sense as an optional augmentation of the Zvkg extension. One option would be to define this as standalone extension that could optionally be combined with Zvkg as yet another extension. Or, perhaps Zvkgs is the superset. In such a case, I would think that it would reference the Zvkg specification and then define the new variants. I don't think it would be wise to have this extension redefine the existing instructions, even if the exact same words are used - it is best to define each instruction only once. I hope that helps. |
Thank you @kdockser that helps a lot. I will perform the changes before the next TG meeting. |
Signed-off-by: Nicolas Brunie <[email protected]>
Signed-off-by: Nicolas Brunie <[email protected]>
Draft generated August 31st 2023 (version 0.0.1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Definitely the next logical step, especially as explicitly outlined as a future direction in the ratified v1.0 spec. Just in case you missed it, I think it would be good to change 128-bit to 64-bit in vclmul.adoc and vclmulh.adoc.
Draft generated Jan 17th 2024 (thank you to @lianakoleva for spotting typos) |
Draft generated Feb 1st 2024 (more cleanups and self review): riscv-crypto-spec-vector-extra_v0.0.3.pdf |
This document describes the proposed _vector_ _extra_ cryptography | ||
extensions for RISC-V. | ||
Those extensions extend the _vector_ cryptography extensions for RISC-V, | ||
providing extra features not mandatory for a high performace implementation but which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "not mandatory for a high performance implementation" mean? RISC-V extensions do not specify whether they are mandatory or not. That is the job of separate documents.
Note: performace is misspelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was formulated to distinguish the extra vector crypto extension from the vector crypto extension itself.
This set of instructions is considered to be an addition to the current vector crypto spec
The vector crypto spec is the main extension implementers and users may want to adopt to get good performance for the supported cryptographic primitives.
These new extensions are additional improvements to the base vector crypto extensions.
I will rephrase this and fix the typo.
extensions for RISC-V. | ||
Those extensions extend the _vector_ cryptography extensions for RISC-V, | ||
providing extra features not mandatory for a high performace implementation but which | ||
can help further improve the efficiency of the algorithms that use them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "them" in "algorithms that use them"? The new extensions or the existing ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was suppose to be "extra features" so "new extensions".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, without the extensions there are no algorithms that use them, so that phrasing is a bit strange. Maybe list some of the specific algorithms that are intended to be improved?
[[zvkgs,Zvkgs]] | ||
=== `Zvkgs` - Vector-Scalar GCM/GMAC | ||
|
||
`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
", it" => ". It "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`. | |
`Zvkgs` depends on `Zvkg`. It extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`. |
[colophon] | ||
= Colophon | ||
|
||
This document describes the Vector Cryptography Extra extensions to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better name for this? What name will be used for the next set of "extra" extensions? What is the difference between an extra extension and a non-extra extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zfa
is using "additional" which may be better here (https://github.com/riscv/riscv-isa-manual/blob/main/src/zfa.adoc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That document is titled ""Zfa" Standard Extension for Additional Floating-Point Instructions, Version 1.0". That's less ambiguous because it includes the actual extension code Zfa, which won't be reused by any future extension. This document just refers to "Vector Cryptography Extra extensions" as if there will never be any more. Maybe include the actual extension codes in the title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will update the title to manage room for future "extra"/additional extensions.
It is important to note that the Vector Crypto instructions are independent of the | ||
implementation of the `Zkt` extension and do not require that `Zkt` is implemented. | ||
|
||
//This specification includes a <<Zvkt>> extension that, when implemented, requires certain vector instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't Zvkt require that these new instructions be constant-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zvkt
will require that these new extensions be constant time, I need to clarify this.
More precisely, Similarly to Zvkg
which mandates
To help avoid side-channel timing attacks, these instructions shall be implemented with data-independent timing.
Zvkgs
should inherit the same mandate and require constant-time even when Zvkt
is not implemented.
Constant time for Zvbc32e
will depend on Zvkt
.
@@ -0,0 +1,23 @@ | |||
[[zvbc32e,Zvbc32e]] | |||
=== `Zvbc32e` - Vector Carryless Multiplication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "32-bit Vector Carryless Multiplication"?
Also, are there specific use cases in mind where this feature would be useful? #309 mentioned CRC-32, but I'm not sure why that would be true. CRC-32 implementations that use "folding" with carryless multiplication multiply segments of the data by a 32-bit multiplicand, but those segments of data can be any length. Other CPUs (and the currently ratified Zvbc extension) provide 64-bit carryless multiplication, so CRC-32 implementations typically multiply 64 bits of data by a 32-bit multiplier to get a 95-bit result. You could multiply 32 bits of data by a 32-bit multiplier to get a 63-bit result, but that would result in twice as many multiplications being needed to compute the CRC-32, as each one would process half as much data. That probably wouldn't be faster.
A similar observation applies to e.g. CRC-16 where the multiplications are typically 64-bit by 16-bit.
BTW, if adding 32-bit, why not also add 16-bit and 8-bit? What is special about 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are very good points.
There are multiple considerations at play here.
- 32-bit CLM allows vector implementation with smaller ELEN=32 to implement it (which was not possible for
Zvbc
) - 64-bit or 32-bit CLM for folding CRC does not make a performance difference as far as I can tell but should make a power difference. It is true that in folding CRC the data segment block can be of arbitrary length but the constant multiplicand is of the size of the CRC so in the case of a CRC-32 at least half the 64-bit CLM would be unused (which might not be an issue on all micro-arch, but 32-bit CRC should allow easier determination of expected multiplier activity).
And finally there is nothing special about 32-bit. It was initially introduced because of the ELEN=32 case, but we listed in the questions above
Should Zvbc32e support SEW=16 ? (SEW=8 ?)
I think there is a good case for supporting other SEW:
- no need encoding required
- should allow other use cases
If we want to go that route the question is:
- Should we split different SEW support into different sub-extensions (
Zvbc16e
,Zvbc8e
) so people can pick and chose ? I think this might not be such a good idea because I don't expect supporting smaller element width to have a big cost (although I did not measure it yet) and the added complexity of numerous extensions does not seem worth it.
(One of the option which was discarded for the fast track was suggesting a widening CLM to merge vclmul
and vclmulh
together, but it seems too big of a change to fit into a fast track and the potential impact of having to switch SEW back and forth was not yet properly evaluated on the target use cases).
|
||
`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`. | ||
|
||
Instructions to enable the efficient implementation of parallel versions of GHASH~H~ which is used in Galois/Counter Mode (GCM) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a complete sentence.
More importantly, it's not clear to me how these vs
instructions for GHASH are helpful, because parallelized GHASH requires multiplying by powers of the key (like H, H^2, H^3, H^4, ...). Does that not preclude the use of these vs
instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple different implementations of parallel GHASH.
One implementation on 4 block in paralell uses a vector of 4 constants H^4 in the loop body which could leverage the .vs
. It still needs a reduction with H, H^2, H^3 in the loop epilog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased things a bit (in the upcoming v0.0.4). The introduction sentence is re-used from https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkg.adoc.
I need to work on an example of use of these instructions.
Draft generated Feb 6th 2024: Changelog:
|
Draft generated March 7th 2024 (v0.0.5) Changelog:
|
Note that this PR will need to be applied against the integrated chapter in the riscv-isa-manual repository if it is still relevant. This repository will be made read only and archived. |
Ok, I will transition to a PR on the main repo. |
Moving to the main |
/!\ This pull request has been moved to the main
riscv-isa-manual
repository: riscv/riscv-isa-manual#1306This pull requests draft the changes associated with two fast track extensions for vector crypto.
During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.
The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing
New features:
Zvbc32e
: Extendingvclmul[h].v[vh]
instruction to supportSEW=32-bit
valueELEN >= 32
) or in addition toZvbc
(ELEN >= 64
)Zvkgs
: Adding.vs
variants tovghsh
andvghmul
Zvkgs
Open questions:
Zvbc32e
be allowed whenELEN >= 32
without depending onZvbc
? (Answer: YES)Zvbc32e
support SEW=16 ? (SEW=8 ?)How to name the two new extensionsZvkt(bc/bc32e)
to extendZvkt
to the extension ofvclmul[h/]
defined inZvbc32e
?Related changes:
spike-isa-sim
modificationsZvbc32e
: [vector-crypto] add support for zvbc32e nibrunieAtSi5/riscv-isa-sim#3Zvbce32
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvbc-test.cZvkgs
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkg-test.cZvkgs
Zvbc32e
Draft versions:
|
Original Plan for the fast track schedule
References