[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

nibrunieAtSi5 · 2024-03-28T14:37:50Z

This pull request is the riscv-isa-manual version of a pull request started on riscv-crypto: riscv/riscv-crypto#362.

/!\ This pull request is a draft for the future fast track extensions.

This pull requests draft the changes associated with two fast track extensions for vector crypto.

During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.

The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing

New features:

Zvbc32e: Extending vclmul[h].v[vh] instruction to support SEW=32-bit value
- should be available standalone (ELEN >= 32) or in addition to Zvbc (ELEN >= 64)
- no new encoding
Zvkgs: Adding .vs variants to vghsh and vghmul
- should depend on Zvkg
- new encodings

Open questions:

Should Zvbc32e be allowed when ELEN >= 32 without depending on Zvbc ? (Answer: YES)
Should Zvbc32e support SEW=16 ? (SEW=8 ?)
Find encodings
~~How to name the two new extensions~~
Do we need to define a Zvkt(bc/bc32e) to extend Zvkt to the extension of vclmul[h/] defined in Zvbc32e ?

Related changes:

spike-isa-sim modifications
- add support for Zvbc32e and Zvkgs: Vector crypto additional riscv-software-src/riscv-isa-sim#1748
encoding proposal in riscv-opcodes repo: Vector crypto additional extensions: Zvbc32e and Zvkgs riscv-opcodes#270
adapt vector crypto code samples for Zvbce32 https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvbc-test.c
adapt vector crypto code samples for Zvkgs https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkg-test.c
Sail models
- Zvkgs
- Zvbc32e

Draft versions:

Version	pdf
v0.0.1 (August 31st 2023)	https://github.com/riscv/riscv-crypto/files/12487628/riscv-crypto-spec-vector-extra.pdf
v0.0.2 (January 17th 2024)	https://github.com/riscv/riscv-crypto/files/13970691/riscv-crypto-spec-vector-extra.pdf
v0.0.3 (February 1st 2024)	https://github.com/riscv/riscv-crypto/files/14146438/riscv-crypto-spec-vector-extra_v0.0.3.pdf
v0.0.4 (February 6th 2024)	riscv-crypto-spec-vector-extra_v0.0.4.pdf
v0.0.5 (March 7th 2024)	riscv-crypto-spec-vector-extra_v0.0.5.pdf
v0.0.6 (June 19th 2024)	riscv-crypto-spec-vector-extra_v0.0.6.pdf
v0.0.7 (July 31st 2024)	riscv-crypto-spec-vector-extra_v0.0.7.pdf
v0.0.9 (September 23rd 2024)	riscv-crypto-spec-vector-extra_v0.0.9.pdf
v0.0.10 (October 19th 2024)	riscv-crypto-spec-vector-extra_v0.0.10.pdf
v0.0.11 (December 5th 2024)	riscv-crypto-spec-vector-extra_v0.0.11.pdf

Changelogs

from v0.0.5 to v0.0.6:
- adding vs2 / vd overlap as reserved encoding for new vghsh.vs / vgmul.vs instructions (review feedback from @QJtaibai )
from v0.0.8 to v0.0.9:
- change name OP-P to OP-VE
- adding justification for Zvbc32e and Zvkgs
from v0.0.9 to v0.0.10:
- Changing Zvbc32e specification to incorporate 8 and 16-bit carry-less multiplication
- Fixing typos (including unifying to "carry-less" (1))
from v0.0.10 to v0.0.11:
- implementing ARC feedback:
  - explicit mention of "reserved" behavior for vl / vstart not multiple of EGS
  - split into two chapters (one per extension)
- multiple typo fixes + switch to in "development" state

Original Plan for the fast track schedule

References

project announcement on the unpriv mailing list (https://lists.riscv.org/g/tech-unprivileged/message/568) and the crypto TG mailing list (https://lists.riscv.org/g/tech-crypto-ext/message/944)

QJtaibai · 2024-06-05T12:13:28Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

nibrunieAtSi5 · 2024-06-20T04:23:42Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

Yunzezhu94 · 2024-06-25T03:09:23Z

Hello! I have a question that in the instruction part of document vghsh.vs shares same encoding with vghsh.vv, and has func6:101100, while in Appendix A it looks vghsh.vs have func6:100011. I wonder which one is correct?

src/vector-crypto-additional.adoc

nibrunie · 2024-07-05T17:37:36Z

Hello! I have a question that in the instruction part of document vghsh.vs shares same encoding with vghsh.vv, and has func6:101100, while in Appendix A it looks vghsh.vs have func6:100011. I wonder which one is correct?

Appendix A and the riscv-v opcode Pull request (https://github.com/nibrunieAtSi5/riscv-opcodes/pull/1/files) are right, the instruciton part of the document was incorrect. I have fixed it.

Note that those opcodes are simply suggestion at this point (but that does not mean the suggestions should not be consistent).

Thank you for pointing that out.

nibrunieAtSi5 · 2024-08-01T04:45:16Z

I have developed a small example to demonstrate how Zvbc32e can be leveraged to implement CRC (with the folding method): https://github.com/nibrunie/rvv-examples/blob/e55d529fe54316a95b733aea82cadbfbfbf08e67/src/crc/vector_crc_be_zvbc32e.c

The current implementation provides "performance" of about 0.65 instruction per bytes (on 1MB input buffer).

nibrunie · 2024-08-01T04:53:47Z

riscv-crypto-spec-vector-extra_v0.0.7.pdf

These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.

wangpc-pp · 2024-08-14T07:22:20Z

src/vector-crypto-additional.adoc

+[wavedrom, , svg]
+....
+{reg:[
+{bits: 7, name: 'OP-P'},


I think we have renamed it to OP_VE in #1311.

wangpc-pp · 2024-08-15T06:20:09Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

topperc · 2024-08-15T06:46:34Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

wangpc-pp · 2024-08-15T06:57:54Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

Yeah, thanks! I get it! But it seems that vd can overlap vs1 for reductions, aren't they the same?

topperc · 2024-08-15T07:02:13Z

Will it be reserved or not for instructions (vghsh and vgmul) , if the vd register group overlaps with the vs2 register group.

Good question @QJtaibai , this is not listed but should be to aligned with the other .vs instructions (and since overlapping those does not make sense functionally and can make implementation harder).

We don't have this constraint on vghsh.vv and vgmul.vv? Is it OK to make it stricter?

For LMUL>1, the operations likely take place over multiple cycles or uops. The restriction on .vs ensure that the scalar element in the first register is not overwritten before all LMUL parts have been processed. This is the same reason we have rules on widening and narrowing instructions. This restriction is not needed for .vv since the input bits and output bits are from the same offset in the register group. Overwriting an earlier part of the register doesn't affect operations on later parts of the register.

Yeah, thanks! I get it! But it seems that vd can overlap vs1 for reductions, aren't they the same?

Maybe the assumption for reductions is that vd would only be updated after all of vs1 is read since you need to see all elements before you know the result.

aswaterman · 2024-08-15T07:18:14Z

@topperc Right, for reductions, the amount of additional state that’s needed is nil for many implementations, but in the worst case, it’s bounded by ELEN. Here, I guess it’s bounded by EGW. That difference could be material for narrow vector machines. @nibrunieAtSi5 can you illuminate the situation?

DavidYu360 · 2024-08-15T14:16:34Z

What about the VPR alignment rules for vghsh.vs / vgmul.vs?
As I mentioned in riscv-software-src/riscv-isa-sim#1548

Possible alignment value: 1, 2, 4, 8 (1 for any VPR, 2 for V0, V2, ...)
Prerequisite: VLEN * LMUL >= EGW

Single key in .vs inst: max(EGW/VLEN, 1)

Others: max(LMUL, 1)

These two extensions add addtional instructions for carryless multiplication with 32-bits elements and Vector-Scalar GCM instructions. Please see riscv/riscv-isa-manual#1306.

nibrunieAtSi5 · 2024-08-19T10:18:02Z

The non-overlap constraint for vector element group is part of the vector crypto specification

riscv-isa-manual/src/vector-crypto.adoc

Line 311 in 04179ad

Source/Destination overlap constraints::

In the case of the .vs instructions defined in this specification, vs2 holds a 128-bit scalar element group.
For implementations with VLEN ≥ 128, vs2 refers to a single register. Thus, the vd register group must not
overlap the vs2 register.
However, in implementations where VLEN < 128, vs2 refers to a register group comprised of the number
of registers needed to hold the 128-bit scalar element group. In this case, the vd register group must not
overlap this vs2 register group.

[%autowidth]
[%header,cols="4,4,4"]
|===
| Instruction
| Register
| Cannot Overlap

| vaes*.vs | vs2 | vd
| vsm4r.vs | vs2 | vd
| vsha2c[hl] | vs1, vs2 | vd
| vsha2ms | vs1, vs2 | vd
| vsm3me | vs2 | vd
| vsm3c | vs2 | vd

It was defined as no overlapping between vector register groups.

This was not necessarily repeated for each operation

riscv-isa-manual/src/vector-crypto.adoc

Line 1530 in 04179ad

    
           * Only for the `.vs` form: the `vd` register group overlaps the `vs2` scalar element group

Only for the .vs form: the vd register group overlaps the vs2 scalar element group

I think this constraint makes sense here as well since allowing overlapping would constraint uarch which cannot save/stage the scalar element group before finishing the operation (as stated by @topperc it is highly likely that for LMUL > EGW / VLEN implementations will iterate to compute the operation over the full vector register groups in a few cycles). Note that .vs makes sense for vl / EGS > 1 since this is where we can actually exploit the fact of having a scalar element group.
The size of vs2 vector register group should be understood as with EMUL = min(EGW / VLEN, 1).
The example in the base vector crypto spec seems to hint this way:

However, in implementations where VLEN < 128, vs2 refers to a register group comprised of the number
of registers needed to hold the 128-bit scalar element group.

but should maybe be generalized to EGW.

topperc · 2024-08-29T05:26:58Z

Zvkgs: Adding .vs variants to vghsh and vghmul
should depend on Zvkgs

Is the second line supposed to say Zvkg rather than Zvkgs?

nibrunie · 2024-09-02T16:24:13Z

Zvkgs: Adding .vs variants to vghsh and vghmul
should depend on Zvkgs

Is the second line supposed to say Zvkg rather than Zvkgs?

Yes, you are right @topperc , this is a typo dependency to Zvkg was intended (the assumption is that the vector-scalar(element group) variant do not make sense without the vector-vector variant.

httoracle · 2024-09-09T09:08:10Z

In the v0.0.7 spec, I have a question of the operation code of the vghsh.vs and vgmul.vs. Take vghsh.vs for example, It says H is common to all element groups, and you put the "let H = brev8(...)" code outside the loop. I suspect the H of all the group should be the same, and is from the first 128bit of vs2. But inside the each loop, H will be changed.
So if you write just like the operation code in the spec, the H of the next loop will not be the first 128bit of the vs2. I want to ask it is just the purpose of the instruction or is just a miss?
If you want the H to be the same value in the beginning of the each loop, you should put "let H = brev8(...)" inside the loop, so in the beginning of the loop, we can get the H value from vs2 again.

nibrunie · 2024-09-10T03:47:28Z

In the v0.0.7 spec, I have a question of the operation code of the vghsh.vs and vgmul.vs. Take vghsh.vs for example, It says H is common to all element groups, and you put the "let H = brev8(...)" code outside the loop. I suspect the H of all the group should be the same, and is from the first 128bit of vs2. But inside the each loop, H will be changed. So if you write just like the operation code in the spec, the H of the next loop will not be the first 128bit of the vs2. I want to ask it is just the purpose of the instruction or is just a miss? If you want the H to be the same value in the beginning of the each loop, you should put "let H = brev8(...)" inside the loop, so in the beginning of the loop, we can get the H value from vs2 again.

@httoracle You are right.
H needs to be reset inside the outer loop. this is a mistake in the current behavior code.
It should be fixed by 9982d13.

nibrunie · 2024-09-10T04:28:13Z

Additional Vector Crypto extension v0.0.8 (September 9th 2024):
riscv-crypto-spec-vector-extra_v0.0.8.pdf

Changelog from v0.0.7:

Fixing behavior of vghsh.vs and vgmul.vs (thanks to @httoracle )
Upgrading alias for 0x77 opcode from OP-P to OP-VE (thanks to @wangpc-pp )

…ensions

Signed-off-by: Nicolas Brunie <[email protected]>

wangpc-pp · 2024-09-23T04:31:14Z

Maybe a silly question: do we need zvbc8e/zvbc16e? I feel it can be useful, but I have no data to show how common CRC8/CRC16 are.

nibrunieAtSi5 · 2024-09-23T04:42:20Z

@wangpc-pp

Maybe a silly question: do we need zvbc8e/zvbc16e? I feel it can be useful, but I have no data to show how common CRC8/CRC16 are.

That is definitely no a silly question and it has been appeared a few times.
There are a few use cases for 8-bit and 16-bit vector carryless multiply (as you mentioned at least 8 and 16-bit CRCs) but when I asked in the crypto task group for motivations, no one was able to say this was a key use cases which would justify specific extensions (rather than relying on widening and existing instructions from Zvbc[32e], it seems 8 and 16-bit CRCs are less used than the 32-bit versions).

From a hardware point of view, I have run a few experiments and supporting 8 and 16-bit vclmul[h] on top of the 32-bit and 64-bit variants would have a minimal cost
No new opcode would need to be allocated since this would simply be extending the range of valid vsew for vclmul[h] to cover 0 and 1
Another question was: if we go down that route do we need to specific Zvbc[8,16,32]e independently or to avoid some fragmentation they would better be specified in a single extension.

nibrunie · 2024-09-23T06:27:21Z

v 0.0.9:
Changelog: Integration feeback, including ARC's:

change name OP-P to OP-Ve
adding justification for Zvbc32e and Zvkgs

riscv-crypto-spec-vector-extra_v0.0.9.pdf

Note:: part of the justification for Zvkgs relies on the fact that a scalar element group in a .vs vector instruction has EMUL=EGW/VLEN. This fact implies that when VLEN >= 128, one single vector register is required to hold a scalar element group for Zvkgs and it does not need to have an index aligned on LMUL (similar to what should apply for Zvkned or Zvksed). Although this can appear intuitive, I don't think this is ever mentioned explicitly in the original vector crypto specification.

wangpc-pp · 2024-09-24T04:26:10Z

@wangpc-pp

Maybe a silly question: do we need zvbc8e/zvbc16e? I feel it can be useful, but I have no data to show how common CRC8/CRC16 are.

That is definitely no a silly question and it has been appeared a few times. There are a few use cases for 8-bit and 16-bit vector carryless multiply (as you mentioned at least 8 and 16-bit CRCs) but when I asked in the crypto task group for motivations, no one was able to say this was a key use cases which would justify specific extensions (rather than relying on widening and existing instructions from Zvbc[32e], it seems 8 and 16-bit CRCs are less used than the 32-bit versions).
* From a hardware point of view, I have run a few experiments and supporting 8 and 16-bit `vclmul[h]` on top of the 32-bit and 64-bit variants would have a **minimal cost**

* No new opcode would need to be allocated since this would simply be extending the range of valid `vsew` for `vclmul[h]` to cover `0` and `1`

* Another question was: if we go down that route do we need to specific Zvbc[8,16,32]e independently or to avoid some fragmentation they would better be specified in a single extension.

I don't know the answer, but here are some investigations:

ARM has 8 bits PMUL.
I tried to find any useful usages via Debian Code Search:
Here are two usages that I think is useful:
- RAID6 syndrome calculation in Linux kernel: https://sources.debian.org/src/linux/6.10.11-1/lib/raid6/neon.uc/?hl=54#L54
- Erasure Code based on GF(8): https://sources.debian.org/src/ceph/18.2.4+ds-7/src/erasure-code/jerasure/gf-complete/src/neon/gf_w4_neon.c/?hl=64#L64

If the cost of supporting 8/16 bits won't be large, personally, I would like to have them added.

nibrunieAtSi5 · 2024-09-24T07:32:54Z

I don't know the answer, but here are some investigations:

ARM has 8 bits PMUL.

I tried to find any useful usages via Debian Code Search:

https://codesearch.debian.net/search?q=svpmul&perpkg=1

https://codesearch.debian.net/search?q=vmul_p8&perpkg=1

https://codesearch.debian.net/search?q=vmulq_p8&perpkg=1

Here are two usages that I think is useful:

RAID6 syndrome calculation in Linux kernel: https://sources.debian.org/src/linux/6.10.11-1/lib/raid6/neon.uc/?hl=54#L54

Erasure Code based on GF(8): https://sources.debian.org/src/ceph/18.2.4+ds-7/src/erasure-code/jerasure/gf-complete/src/neon/gf_w4_neon.c/?hl=64#L64

If the cost of supporting 8/16 bits won't be large, personally, I would like to have them added.

That is very interesting, it would be also interesting to check if anyone has ever ported jerasure/gf-complete over RISC-V (Vector) and what would be the expected benefit overall.

topperc · 2024-09-24T21:21:09Z

Is there any concern that the name Zvkgs is too similar to the existing Zvksg? Though this isn't the first time we had such similar names, we have Zbc and Zcb.

nibrunieAtSi5 · 2024-09-25T01:57:34Z

Is there any concern that the name Zvkgs is too similar to the existing Zvksg? Though this isn't the first time we had such similar names, we have Zbc and Zcb.

No concerns were voiced so far, but if you think this could be an issue we can try to come up with another name. (Zvkga for additional Zvkg instructions ?)

…iew.

wangpc-pp · 2024-10-06T00:07:42Z

Zvbc32e may not be suitable now if we extend SEW to 8/16/32 bits. Maybe Zvbcxw(Extended Width), Zvbcxe(Extended EEW)?

nibrunieAtSi5 · 2024-10-06T00:17:28Z

Zvbc32e may not be suitable now if we extend SEW to 8/16/32 bits. Maybe Zvbcxw(Extended Width), Zvbcxe(Extended EEW)?

I discussed this with @aswaterman and he suggested that the suffix 32* has already been used to mean any element width 32-bit and lower, e.g. in Zve32x, so I think Zvbc32e still works (although the e in Zve32x stands for embedded).

aswaterman · 2024-10-06T06:37:33Z

Agreed

wangpc-pp · 2024-10-08T03:36:25Z

I don't agree this wording, I strongly recommond that we should change to another name.
My thinking is that 32 in Zve32x/f is Minimum VLEN, not Supported EEW. For Zvbc32e, the postfix e indicates that it is about EEW, so people may think only 32 bits is supported.

nibrunieAtSi5 · 2024-10-11T21:56:22Z

I don't agree this wording, I strongly recommond that we should change to another name. My thinking is that 32 in Zve32x/f is Minimum VLEN, not Supported EEW. For Zvbc32e, the postfix e indicates that it is about EEW, so people may think only 32 bits is supported.

I understand your concern @wangpc-pp
Even if we interpret Zve has meaning is the minimum VLEN then it also works here for Zvbc(32e/e32/e32x)
We could switch to e32 rather than 32e and have Zvbce32 (IIRC we need to avoid ending an extension name with a number to facilitate parsing ISA string and this is why I used 32e) to mean that the minimum VLEN supported by the extension is 32 and it has the same supported list of EEW as Zve32(x/f): that is 8, 16, and 32.

If I summarize, the options are:

Zvbc32e
Zvbcxw (Extended Width),
Zvbcxe (extended EEW)
Zvbce32
Zvbce32x

I would be in favour of keeping 32 somewhere in the name because at least it makes explicit one of the constraint (VLEN/ELEN) but I agree that Zvbc was not explicit on listing the only EEW it supports and having only 32 in the name does not make it explicit that it supports 8 and 16 bits SEW values as well.

topperc · 2024-10-11T22:22:49Z

I don't agree this wording, I strongly recommond that we should change to another name. My thinking is that 32 in Zve32x/f is Minimum VLEN, not Supported EEW. For Zvbc32e, the postfix e indicates that it is about EEW, so people may think only 32 bits is supported.

I don't think the 32 in Zve32x/f is about VLEN. The focus on the Zve* extensions is about reducing ALU cost by limiting which element sizes are supported or what operations are supported on large elements. The removal of 64-bit integers elements is the important difference between Zve32x and Zve64x not the minimum VLEN.

nibrunieAtSi5 · 2024-10-11T22:53:14Z

I don't agree this wording, I strongly recommond that we should change to another name. My thinking is that 32 in Zve32x/f is Minimum VLEN, not Supported EEW. For Zvbc32e, the postfix e indicates that it is about EEW, so people may think only 32 bits is supported.

I don't think the 32 in Zve32x/f is about VLEN. The focus on the Zve* extensions is about reducing ALU cost by limiting which element sizes are supported or what operations are supported on large elements. The removal of 64-bit integers elements is the important difference between Zve32x and Zve64x not the minimum VLEN.

That was my initial interpretation: that the VLEN constraint relaxation was actually a consequence on the limitation of ELEN in general or on the largest SEW supported for floating-point elements.

wangpc-pp · 2024-10-12T03:20:40Z

Thanks for these explanations! I am still not convinced but I will let it be since it seems I'm in the minority. 😄

kasanovic · 2024-10-15T05:00:53Z

The 32 in Zve32 represents ELEN. The minimum VLEN is set to be the same as ELEN for these extensions (Table 62 in current vector chapter).

nibrunie · 2024-10-19T19:12:45Z

New version: v0.0.10:

Changelog:

Changing Zvbc32e definition to include 8-bit and 16-bit SEW values
Fixing typos (including unifying to "carry-less" (1))

riscv-crypto-spec-vector-extra_v0.0.10.pdf

(1): it seems the original vector crypto specification uses both "carryless" and "carry-less". Looking at wikipedia, both seem valid (https://en.wikipedia.org/wiki/Carry-less_product, https://en.wikipedia.org/wiki/Finite_field_arithmetic#Carryless_multiply). I chose to unify to "carry-less" but I am open to suggestion.

nibrunie · 2024-12-05T20:02:30Z

New version of the spec draft (v0.0.11 Dec5th 2024):

riscv-crypto-spec-vector-extra_v0.0.11.pdf

Changelog:

implementing ARC feedback:
- explicit mention of "reserved" behavior for vl / vstart not multiple of EGS
- split into two chapters (one per extension)
multiple typo fixes + switch to in "development" state

nibrunieAtSi5 mentioned this pull request Mar 28, 2024

[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs riscv/riscv-crypto#362

Closed

13 tasks

nibrunie reviewed Jul 5, 2024

View reviewed changes

src/vector-crypto-additional.adoc Outdated Show resolved Hide resolved

henry-hsieh mentioned this pull request Jul 23, 2024

Add ratified or drafted extensions henry-hsieh/riscv-asm-vim#1

Open

72 tasks

This was referenced Jul 25, 2024

Vector crypto additional riscv-software-src/riscv-isa-sim#1748

Draft

Vector crypto additional extensions: Zvbc32e and Zvkgs riscv/riscv-opcodes#270

Draft

wangpc-pp mentioned this pull request Aug 14, 2024

[RISCV][MC] Support experimental extensions Zvbc32e and Zvkgs llvm/llvm-project#103709

Merged

wangpc-pp reviewed Aug 14, 2024

View reviewed changes

nibrunieAtSi5 added 2 commits September 20, 2024 14:04

[AdditionalVectorCrypto] adding section on addition vector crypto ext…

ee09c33

…ensions

fixing some inter/intra document links + typos

b9dda07

nibrunie added 2 commits September 20, 2024 14:04

Fixing vghsh.vs opcode

320d64f

Signed-off-by: Nicolas Brunie <[email protected]>

Fixing H value reset for vghsh/vgmul

c4f3549

Signed-off-by: Nicolas Brunie <[email protected]>

nibrunie force-pushed the additional-vector-crypto branch from 9982d13 to 9b12009 Compare September 20, 2024 21:11

Integrating review feedback from the ARC

a12c20d

nibrunie force-pushed the additional-vector-crypto branch from 9b12009 to a12c20d Compare September 20, 2024 21:21

nibrunie added 2 commits October 5, 2024 12:27

Adding a sentence introducing the 2 new extensions in Extension Overv…

80f3e93

…iew.

Extending Zvbc32e to cover 8b and 16b elements

9c50bbd

Fixing typo / unifying carryless to carry-less

d9c54b1

nibrunie and others added 4 commits December 4, 2024 23:31

ARC feedback: reserved behavior + splitting into 2 chapters

e7ce102

Typos + document state clarification

b36d20f

page -> section

0967220

Typo

664a23d

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

Are you sure you want to change the base?

[AdditionalVectorCrypto] adding section on addition vector crypto extensions #1306

Conversation

nibrunieAtSi5 commented Mar 28, 2024 • edited Loading

New features:

Open questions:

Related changes:

Draft versions:

Changelogs

Original Plan for the fast track schedule

References

QJtaibai commented Jun 5, 2024

nibrunieAtSi5 commented Jun 20, 2024

Yunzezhu94 commented Jun 25, 2024

nibrunie commented Jul 5, 2024

nibrunieAtSi5 commented Aug 1, 2024

nibrunie commented Aug 1, 2024

wangpc-pp Aug 14, 2024

Choose a reason for hiding this comment

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024 • edited Loading

wangpc-pp commented Aug 15, 2024

topperc commented Aug 15, 2024

aswaterman commented Aug 15, 2024

DavidYu360 commented Aug 15, 2024

nibrunieAtSi5 commented Aug 19, 2024

topperc commented Aug 29, 2024

nibrunie commented Sep 2, 2024

httoracle commented Sep 9, 2024

nibrunie commented Sep 10, 2024 • edited Loading

nibrunie commented Sep 10, 2024

wangpc-pp commented Sep 23, 2024 • edited Loading

nibrunieAtSi5 commented Sep 23, 2024

nibrunie commented Sep 23, 2024

wangpc-pp commented Sep 24, 2024

nibrunieAtSi5 commented Sep 24, 2024

topperc commented Sep 24, 2024

nibrunieAtSi5 commented Sep 25, 2024

wangpc-pp commented Oct 6, 2024

nibrunieAtSi5 commented Oct 6, 2024 • edited Loading

aswaterman commented Oct 6, 2024

wangpc-pp commented Oct 8, 2024

nibrunieAtSi5 commented Oct 11, 2024

topperc commented Oct 11, 2024

nibrunieAtSi5 commented Oct 11, 2024

wangpc-pp commented Oct 12, 2024

kasanovic commented Oct 15, 2024

nibrunie commented Oct 19, 2024

nibrunie commented Dec 5, 2024

nibrunieAtSi5 commented Mar 28, 2024 •

edited

Loading

topperc commented Aug 15, 2024 •

edited

Loading

nibrunie commented Sep 10, 2024 •

edited

Loading

wangpc-pp commented Sep 23, 2024 •

edited

Loading

nibrunieAtSi5 commented Oct 6, 2024 •

edited

Loading