The RISC-V vector C intrinsics provide users interfaces in the C language level to directly leverage the RISC-V "V" extension 0 (also abbreviated as "RVV"), with assistance from the compiler in handling instruction scheduling and register allocation. The intrinsics also aim to free users from responsibility of maintaining the correct configuration settings 18 for the vector instruction executions.
This document uses the term "RVV" as an abbreviation for the RISC-V "V" extension. This document uses the term "the specification" to indicate the RISC-V "V" extension specification.
The __riscv_v_intrinsic
macro is the C macro to test the compiler’s support for the RISC-V "V" extension intrinsics.
The value of the test macro is defined as its version, which is computed using the following formula. The formula is identical to what is defined in the RISC-V C API specification 1 .
<MAJOR_VERSION> * 1,000,000 + <MINOR_VERSION> * 1,000 + <REVISION_VERSION>
For example, the v1.0 version should define the macro with value 1000000
.
To leverage the intrinsics in the toolchain, the header <riscv_vector.h>
needs to be included. We suggest guarding the inclusion with the test macro.
#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif /* __riscv_v_intrinsic */
With <riscv_vector.h>
included, availability of intrinsic variants depends on the required architecture of their corresponding vector instructions. The supported architecture is specified to the compiler using the -march
option 2,3.
The standard vector extensions 19 provides a set of smaller extensions for embedded use. Please check out the Zve
extensions 20 for the varying degree of support.
For example, RVV type vint64m1_t
and __riscv_vle64_v_i64m1
are not available under architecture rv64gc_zve32x
.
The intrinsics allow users to control the fields in vtype
, as well as the rounding modes for fixed-point (vxrm
) and floating-point (frm
) vector computations.
In this chapter, we cover how the intrinsics embed the control of vtype
fields in the function names. Please see Naming scheme for the rules described in this chapter.
The RISC-V vector intrinsics' data types are strongly-typed. The vector intrinsics encode the EEW (effective-element-width) 17 and EMUL (effective LMUL) 17 of the destination vector register in the suffix of the function name. Users can expect the results of the vector instruction intrinsics are computed under the specified EEW and EMUL.
To see the full list of data types for the intrinsics, please see Type system.
In the following example, the intrinsic will produce the semantic of a vadd.vv
instruction, with source vector operands of EEW=32
and EMUL=1
, which can be observed through provided RVV data type; and produce results for the destination vector operand with EEW=32
and EMUL=1
.
vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
In the following example, the intrinsic will produce the semantic of a vadd.vv
instruction, with source vector operands of EEW=16
and EMUL=1/2
, which can be observed through provided RVV data type; and produce results for the destination vector operand with EEW=32
and EMUL=1
.
vint32m1_t __riscv_vwadd_vv_i32m1(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);
The intrinsics do not directly expose the vector length control register (assembly mnemonics vl
11) to the intrinsics programmer. The intrinsics programmer specifies an "application vector length (AVL)" 18 using the argument size_t vl
. The implementation is responsible to set the correct value into the underlying vector length control register (vl
) given the informed AVL.
Note
|
The intrinsics for instructions that behave the same with different vl settings (e.g. vmv.s.x ) do not have a size_t vl argument.
|
Note
|
The actual value written to the vl control register is an implementation defined behavior and is typically not known until runtime. The actual setting of vl , given the provided AVL through the parameter, follows the rules 27 in the specification. The number of elements processed can be obtained through the _riscv_vsetvl* intrinsics vsetvl .
|
Instructions that are available for masking 7 have masked variant intrinsics.
The intrinsics fuse the control of vector masking (vm
) together with the control for policy behavior (vta
, vma
) in the same suffix. Please checkout Control of behavior of destination tail elements and destination inactive masked-off elements and Policy and masked naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.
The behavior of destination tail elements and destination inactive masked-off elements is controlled by the vta
and vma
bits 6.
Given the general assumption that target audience of the intrinsics are high performance cores, and an "undisturbed" policy will generally slow down an out-of-order core, the intrinsics have a default policy scheme of tail-agnostic and mask-agnostic (that is, vta=1
and vma=1
).
The intrinsics fuse the control of vector masking (vm
) together with the control for policy behavior (vta
, vma
) in the same suffix. Please checkout Control of vector masking and Policy and masked naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.
For the fixed-point intrinsics, representing the fixed-point arithmetic instructions 21, the vxrm
argument of the intrinsics indicates the rounding mode (vxrm
) 8 control.
The vxrm
argument is required to be a constant integer expression. The implementation should provide the following enum
that maps to the defined rounding mode values under Table 4 8 of the specification.
enum __RISCV_VXRM {
__RISCV_VXRM_RNU = 0,
__RISCV_VXRM_RNE = 1,
__RISCV_VXRM_RDN = 2,
__RISCV_VXRM_ROD = 3,
};
Note
|
Rounding mode does not affect the computations of vsadd , vsaddu , vssub , and vssubu ; therefore, the intrinsics for these instructions do not have the vxrm argument.
|
Note
|
The RISC-V psABI 9 states that vxrm is not preserved across calls. Optimization for reducing the number of redundant writes to vxrm is a compiler and system specific issue.
|
Note
|
Control of the vector fixed-point saturation flag (vxsat ) 22 is not yet covered in the vector intrinsics v1.0. We plan to support it in follow-up versions in a compatible way with existing intrinsics in v1.0.
|
For the floating-point intrinsics, representing the floating-point arithmetic instructions 23, the intrinsics have two variants, called the implicit-frm
and the explicit-frm
intrinsics.
Note
|
Control of the floating-point accrued exceptions flag fields (fflag ) 10 is not yet covered in the vector intrinsics v1.0. We plan to support it in follow-up versions in a compatible way with existing intrinsics in v1.0.
|
The implicit-frm
intrinsics behave like any C-language floating-point expressions, using the default rounding mode when FENV_ACCESS
is off, and using the fenv
dynamic rounding mode when FENV_ACCESS
is on.
Note
|
Both GNU and LLVM compilers generate scalar floating-point instructions using dynamic rounding mode, relying on the environment initialization to set frm to RNE (specified as "roundTiesToEven" in IEEE-754 (a.k.a. IEC 60559)).
|
Note
|
The implicit-frm intrinsics are intended to be used regardless of FENV_ACCESS . They are provided when FENV_ACCESS is on for the (few) programmers who are already using fenv; and they are provided when FENV_ACCESS is off for the (vast majority of) programmers who prefer the default rounding mode.
|
The explicit-frm
intrinsics contain the frm
argument which indicates the rounding mode (frm
) 10 control. The floating-point intrinsics with the frm
argument are followed by an _rm
suffix in the function name.
The frm
argument is required to be a constant integer expression. The implementation should provide the following enum
that maps to the defined rounding mode values under RISC-V ISA Manual Table 8.1 9.
enum __RISCV_FRM {
__RISCV_FRM_RNE = 0,
__RISCV_FRM_RTZ = 1,
__RISCV_FRM_RDN = 2,
__RISCV_FRM_RUP = 3,
__RISCV_FRM_RMM = 4,
};
Note
|
The explicit-frm intrinsics are intended to be used when FENV_ACCESS is off, enabling more aggressive optimization while still providing the programmer with control over the rounding mode. Using explicit-frm intrinsics when FENV_ACCESS is on will still work correctly, but is expected to lead to extra saving/restoring of frm , that could be avoided by using fenv functionality and implicit-frm intrinsics.
|
The naming scheme of the intrinsics expresses the users' control of fields in vtype
, vl
, and rounding modes for fixed-point and floating-point vector computations. For details of these CSR controls, please see [control-of-vector-programming-mode].
As mentioned in Control of vector masking and Control of behavior of destination tail elements and destination inactive masked-off elements, the intrinsics fuses the control of vm
, vta
, and vma
into the same suffix. Policy and masked naming scheme enumerates the exact suffixes. You may find where these suffixes are appended in Explicit (Non-overloaded) naming scheme.
The intrinsics can be split into two major types, called "explicit (non-overloaded) intrinsics" and "implicit (overloaded) intrinsics".
The explicit (non-overloaded) intrinsics embed the control described in Control of the vector extension programming model in the function name. This scheme gives intrinsic codebase more readability as the execution states are explicitly specified in the code.
The implicit (overloaded) intrinsics, on the contrary, omit the explicit specifications for vtype
control. The implicit (overloaded) intrinsics aim to provide a generic interface to let users put values of different EEW 17 and EMUL 17 as the input argument.
This section covers the general naming rule of the two types of intrinsics accordingly. Then, this section also enumerates the exceptions and the rationales behind them in Exceptions in the explicit (non-overloaded) naming scheme and Exceptions in the implicit (overloaded) naming scheme.
With the default policy scheme mentioned under Control of behavior of destination tail elements and destination inactive masked-off elements, each intrinsic provides corresponding variants for their available control of vm
, vta
and vma
. The following list enumerates the possible suffixes.
-
No suffix: Represents an unmasked (
vm=1
) vector operation with tail-agnostic (vta=1
) -
_tu
suffix: Represents an unmasked (vm=1
) vector operation with tail-undisturbed (vta=0
) policy -
_m
suffix: Represents a masked (vm=0
) vector operation with tail-agnostic (vta=1
), mask-agnostic (vma=1
) policy -
_tum
suffix: Represents a masked (vm=0
) vector operation with tail-undisturbed (vta=0
), mask-agnostic (vma=1
) policy -
_mu
suffix: Represents a masked (vm=0
) vector operation with tail-agnostic (vta=1
), mask-undisturbed (vma=0
) policy -
_tumu
suffix: Represents a masked (vm=0
) vector operation with tail-undisturbed (vta=0
), mask-undisturbed (vma=0
) policy
Using vadd
with EEW=32 and EMUL=1 as an example, the variants are:
// vm=1, vta=1
vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
// vm=1, vta=0
vint32m1_t __riscv_vadd_vv_i32m1_tu(vint32m1_t vd, vint32m1_t vs2,
vint32m1_t vs1, size_t vl);
// vm=0, vta=1, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_m(vbool32_t vm, vint32m1_t vs2, vint32m1_t vs1,
size_t vl);
// vm=0, vta=0, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_tum(vbool32_t vm, vint32m1_t vd,
vint32m1_t vs2, vint32m1_t vs1, size_t vl);
// vm=0, vta=1, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_mu(vbool32_t vm, vint32m1_t vd, vint32m1_t vs2,
vint32m1_t vs1, size_t vl);
// vm=0, vta=0, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_tumu(vbool32_t vm, vint32m1_t vd,
vint32m1_t vs2, vint32m1_t vs1,
size_t vl);
Note
|
When policy is set to "agnostic", there is no guarantee of what will be in the tail/masked-off elements. Under this policy, users should not assume the values within to be deterministic. |
Note
|
Pseudo intrinsics mentioned under Pseudo intrinsics do not map to real vector instructions. Therefore these intrinsics are not affected by the policy setting, nor do they have intrinsic variants of the suffixes listed above. |
In general, the intrinsics are encoded as the following. The intrinsics under this naming scheme are the "non-overloaded intrinsics", which in parallel we have the "overloaded intrinsics" defined under Implicit (Overloaded) naming scheme.
The naming rules are as follows.
__riscv_{V_INSTRUCTION_MNEMONIC}_{OPERAND_MNEMONIC}_{RETURN_TYPE}_{ROUND_MODE}_{POLICY}{(...)
-
OPERAND_MNEMONIC
are likev
,vv
,vx
,vs
,vvm
,vxm
-
RETURN_TYPE
depends on whether the return type of the vector instruction is a mask register…-
For intrinsics that represents instructions with a non-mask destination register:
-
EEW
is one ofi8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64
. -
EMUL
is one ofm1 | m2 | m4 | m8 | mf2 | mf4 | mf8
. -
Type system explains the limited enumeration of EEW-EMUL pairs.
-
-
For intrinsics that represent intrinsics with a mask destination register:
-
RETURN_TYPE
is one ofb1 | b2 | b4 | b8 | b16 | b32 | b64
, which is derived from the ratioEEW
/EMUL
.
-
-
-
V_INSTRUCTION_MNEMONIC
are likevadd
,vfmacc
,vsadd
. -
ROUND_MODE
is the_rm
suffix mentioned in Explicit-frm
intrinsics. Other intrinsics do not have this suffix. -
POLICY
are enumerated under Policy and masked naming scheme.
The general naming scheme is not sufficient to express all intrinsics. The exceptions are enumerated in the proceeding section Exceptions in the explicit (non-overloaded) naming scheme.
This section enumerates the exceptions in the explicit (non-overloaded) naming scheme.
Only encoding the return type will cause naming collisions for the permutation instruction intrinsics. The intrinsics encode the input vector type and the output scalar type in the suffix.
int8_t __riscv_vmv_x_s_i8m1_i8 (vint8m1_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m2_i8 (vint8m2_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m4_i8 (vint8m4_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m8_i8 (vint8m8_t vs2, size_t vl);
Only encoding the return type will cause naming collisions for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output vector type in the suffix.
vint8m1_t __riscv_vredsum_vs_i8m1_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m2_i8m1(vint8m2_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m4_i8m1(vint8m4_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m8_i8m1(vint8m8_t vs2, vint8m1_t vs1, size_t vl);
Only encoding the return type will cause naming collisions for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output mask vector type in the suffix.
vbool64_t __riscv_vmadc_vvm_i8mf8_b64(vint8mf8_t vs2, vint8mf8_t vs1,
vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i16mf4_b64(vint16mf4_t vs2, vint16mf4_t vs1,
vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i32mf2_b64(vint32mf2_t vs2, vint32mf2_t vs1,
vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i64m1_b64(vint64m1_t vs2, vint64m1_t vs1,
vbool64_t v0, size_t vl);
Only encoding the return type will cause naming collisions for these pseudo instructions. The intrinsics encode the input vector type before the return type in the suffix.
The following shows an example with __riscv_vreinterpret_v
of vint32m1_t
input vector type.
vfloat32m1_t __riscv_vreinterpret_v_i32m1_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_v_i32m1_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_v_i32m1_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_v_i32m1_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_v_i32m1_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_v_i32m1_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_v_i32m1_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_v_i32m1_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_v_i32m1_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_v_i32m1_b4 (vint32m1_t src);
The implicit (overloaded) interface aims to provide a generic interface that takes values of different EEW and EMUL as the input. Therefore, the implicit intrinsics omit the EEW and EMUL encoded in the function name. The _rm
prefix for explicit-frm
intrinsics (Control of floating-point rounding mode) is also omitted. The intrinsics under this scheme are the "overloaded intrinsics", which in parallel we have the "non-overloaded intrinsics" defined under Explicit (Non-overloaded) naming scheme.
Take the vector addition (vadd
) instruction intrinsics as an example, stripping off the operand mnemonics and encoded EEW, EMUL information, the intrinsics provides the following overloaded interfaces.
vint32m1_t __riscv_vadd(vint32m1_t v0, vint32m1_t v1, size_t vl);
vint16m4_t __riscv_vadd(vint16m4_t v0, vint16m4_t v1, size_t vl);
Since the main intent is to let the users put different value(s) of EEW and EMUL as input argument(s), the overloaded intrinsics do not omit the policy suffix. That is, the suffix listed under Control of behavior of destination tail elements and destination inactive masked-off elements is not omitted and is still encoded in the function name.
The masked variants with the default policy shares the same interface with the unmasked variants with the default policy. They do not have any trailing suffixes.
Take the vector floating-point add (vfadd
) as an example, the intrinsics provides the following overloaded interfaces.
vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd(vbool32_t vm, vfloat32m1_t vs2, vfloat32m1_t vs1,
size_t vl);
vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned int frm,
size_t vl);
vfloat32m1_t __riscv_vfadd(vbool32_t vm, vfloat32m1_t vs2, vfloat32m1_t vs1,
unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
vfloat32m1_t vs1, unsigned int frm, size_t vl);
The naming scheme to prune everything except the instruction mnemonics is not available for all the intrinsics. Please see Exceptions in the implicit (overloaded) naming scheme for overloaded intrinsics with irregular naming patterns.
Due to the limitations of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Therefore these intrinsics do not possess a simplified, EEW/EMUL-omitted interface. Please see Un-supported intrinsics for implicit (overloaded) naming scheme for more detail.
The following intrinsics have an irregular naming pattern.
Widening instruction intrinsics (e.g. vwadd
) have the same return type but different types of arguments. The operand mnemonics are encoded into their overloaded versions to help distinguish them.
vint32m1_t __riscv_vwadd_vv(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);
vint32m1_t __riscv_vwadd_vx(vint16mf2_t vs2, int16_t rs1, size_t vl);
vint32m1_t __riscv_vwadd_wv(vint32m1_t vs2, vint16mf2_t vs1, size_t vl);
vint32m1_t __riscv_vwadd_wx(vint32m1_t vs2, int16_t rs1, size_t vl);
Type-convert instruction intrinsics (e.g. vfcvt.x.f
, vfcvt.xu.f
, vfcvt.rtz.xu.f
) encode the returning type mnemonics into their overloaded variants to help distinguish them.
The following shows how _x
, _rtz_x
, _xu
, and _rtz_xu
are appended to the suffix for distinction.
vint32m1_t __riscv_vfcvt_x (vfloat32m1_t src, size_t vl);
vint32m1_t __riscv_vfcvt_rtz_x (vfloat32m1_t src, size_t vl);
vuint32m1_t __riscv_vfcvt_xu (vfloat32m1_t src, size_t vl);
vuint32m1_t __riscv_vfcvt_rtz_xu (vfloat32m1_t src, size_t vl);
These pseudo intrinsics encode the return type (e.g. __riscv_vreinterpret_b8
) into their overloaded variants to help distinguish them.
The following shows how the return type is appended to the suffix for distinction.
vfloat32m1_t __riscv_vreinterpret_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_b4 (vint32m1_t src);
Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Intrinsics with characteristics of either of the following do not possess an overloaded version.
-
Intrinsics with input arguments that are all scalar types and scalar types alone (e.g. unmasked vector load instruction intrinsics,
vmv.s.x
) -
Intrinsics with
vl
as the only argument (e.g.vmclr
,vmset
,vid
) -
Intrinsics with vector boolean input(s), returning a vector non-boolean vector type (e.g.
viota
)
The intrinsics are designed to be strongly-typed. The intrinsics provide vreinterpret
intrinsics to help users go across the strongly-typed scheme if necessary.
Non-mask (integer and floating-point) data types have SEW and LMUL encoded.
Integer types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.
Types with subscript "*" are available when ELEN >= 64
(that is, unavailable under Zve32*
and require at least Zve64x
).
Types | EMUL=1/8 | EMUL=1/4 | EMUL=1/ 2 | EMUL=1 | EMUL=2 | EMUL=4 | EMUL=8 |
---|---|---|---|---|---|---|---|
int8_t |
vint8mf8_t* |
vint8mf4_t |
vint8mf2_t |
vint8m1_t |
vint8m2_t |
vint8m4_t |
vint8m8_t |
int16_t |
N/A |
vint16mf4_t* |
vint16mf2_t |
vint16m1_t |
vint16m2_t |
vint16m4_t |
vint16m8_t |
int32_t |
N/A |
N/A |
vint32mf2_t* |
vint32m1_t |
vint32m2_t |
vint32m4_t |
vint32m8_t |
int64_t |
N/A |
N/A |
N/A |
vint64m1_t* |
vint64m2_t* |
vint64m4_t* |
vint64m8_t* |
uint8_t |
vuint8mf8_t* |
vuint8mf4_t |
vuint8mf2_t |
vuint8m1_t |
vuint8m2_t |
vuint8m4_t |
vuint8m8_t |
uint16_t |
N/A |
vuint16mf4_t* |
vuint16mf2_t |
vuint16m1_t |
vuint16m2_t |
vuint16m4_t |
vuint16m8_t |
uint32_t |
N/A |
N/A |
vuint32mf2_t* |
vuint32m1_t |
vuint32m2_t |
vuint32m4_t |
vuint32m8_t |
uint64_t |
N/A |
N/A |
N/A |
vuint64m1_t* |
vuint64m2_t* |
vuint64m4_t* |
vuint64m8_t* |
Floating-point types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.
Floating-point types with element widths of 16 (Types=_Float16
) require the zvfh
and zvfhmin
extension to be specified in the architecture.
Floating-point types with element widths of 32 (Types=float
) require the zve32f
extension to be specified in the architecture.
Floating-point types with element widths of 64 (Types=double
) require the zve64d
extension to be specified in the architecture.
Note
|
Although C++23 introduces <stdfloat> for fixed-width floating-point types, this latest standard is not yet supported in the upstream RISC-V compiler. The specification (along with the prototype lists in appendix) uses Float16 /float /double to represent the floating-point type with element width of 16/32/64.
|
Types with subscript "\*" are available when ELEN >= 64
(that is, unavailable under Zve32f
and require at least Zve64f
).
Types | EMUL=1/8 | EMUL=1/4 | EMUL=1/ 2 | EMUL=1 | EMUL=2 | EMUL=4 | EMUL=8 |
---|---|---|---|---|---|---|---|
_Float16 |
N/A |
vfloat16mf4_t* |
vfloat16mf2_t |
vfloat16m1_t |
vfloat16m2_t |
vfloat16m4_t |
vfloat16m8_t |
float |
N/A |
N/A |
vfloat32mf2_t* |
vfloat32m1_t |
vfloat32m2_t |
vfloat32m4_t |
vfloat32m8_t |
double |
N/A |
N/A |
N/A |
vfloat64m1_t |
vfloat64m2_t |
vfloat64m4_t |
vfloat64m8_t |
Mask types have the ratio that is derived from EEW
/EMUL
encoded into the type. The mask types represent mask register values that follows the Mask Register Layout 14.
Types with subscript "\*" are available when ELEN >= 64
(that is, unavailable under Zve32x
and require at least Zve64x
).
Types | n = 1 | n = 2 | n = 4 | n = 8 | n = 16 | n = 32 | n = 64 |
---|---|---|---|---|---|---|---|
bool |
vbool1_t |
vbool2_t |
vbool4_t |
vbool8_t |
vbool16_t |
vbool32_t |
vbool64_t* |
Tuple types encode SEW
, LMUL
, and NFIELD
16 into the data type.
These types are utilized through the segment load/store instruction intrinsics along with getters vget
and setters vset
to extract/combine them. The types listed in Integer types and Floating-point types all have tuple types. Types under the combination of LMUL
, NFIELD
follows the restriction by the specification, EMUL * NFIELDS ≤ 8
.
Availability of the tuple types follows the availability of their corresponding non-tuple (NFIELD=1
) types.
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8mf8_t |
vint8mf8x2_t |
vint8mf8x3_t |
vint8mf8x4_t |
vint8mf8x5_t |
vint8mf8x6_t |
vint8mf8x7_t |
vint8mf8x8_t |
vuint8mf8_t |
vuint8mf8x2_t |
vuint8mf8x3_t |
vuint8mf8x4_t |
vuint8mf8x5_t |
vuint8mf8x6_t |
vuint8mf8x7_t |
vuint8mf8x8_t |
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8mf4_t |
vint8mf4x2_t |
vint8mf4x3_t |
vint8mf4x4_t |
vint8mf4x5_t |
vint8mf4x6_t |
vint8mf4x7_t |
vint8mf4x8_t |
vuint8mf4_t |
vuint8mf4x2_t |
vuint8mf4x3_t |
vuint8mf4x4_t |
vuint8mf4x5_t |
vuint8mf4x6_t |
vuint8mf4x7_t |
vuint8mf4x8_t |
vint16mf4_t |
vint16mf4x2_t |
vint16mf4x3_t |
vint16mf4x4_t |
vint16mf4x5_t |
vint16mf4x6_t |
vint16mf4x7_t |
vint16mf4x8_t |
vuint16mf4_t |
vuint16mf4x2_t |
vuint16mf4x3_t |
vuint16mf4x4_t |
vuint16mf4x5_t |
vuint16mf4x6_t |
vuint16mf4x7_t |
vuint16mf4x8_t |
vfloat16mf4_t |
vfloat16mf4x2_t |
vfloat16mf4x3_t |
vfloat16mf4x4_t |
vfloat16mf4x5_t |
vfloat16mf4x6_t |
vfloat16mf4x7_t |
vfloat16mf4x8_t |
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8mf2_t |
vint8mf2x2_t |
vint8mf2x3_t |
vint8mf2x4_t |
vint8mf2x5_t |
vint8mf2x6_t |
vint8mf2x7_t |
vint8mf2x8_t |
vuint8mf2_t |
vuint8mf2x2_t |
vuint8mf2x3_t |
vuint8mf2x4_t |
vuint8mf2x5_t |
vuint8mf2x6_t |
vuint8mf2x7_t |
vuint8mf2x8_t |
vint16mf2_t |
vint16mf2x2_t |
vint16mf2x3_t |
vint16mf2x4_t |
vint16mf2x5_t |
vint16mf2x6_t |
vint16mf2x7_t |
vint16mf2x8_t |
vuint16mf2_t |
vuint16mf2x2_t |
vuint16mf2x3_t |
vuint16mf2x4_t |
vuint16mf2x5_t |
vuint16mf2x6_t |
vuint16mf2x7_t |
vuint16mf2x8_t |
vint32mf2_t |
vint32mf2x2_t |
vint32mf2x3_t |
vint32mf2x4_t |
vint32mf2x5_t |
vint32mf2x6_t |
vint32mf2x7_t |
vint32mf2x8_t |
vuint32mf2_t |
vuint32mf2x2_t |
vuint32mf2x3_t |
vuint32mf2x4_t |
vuint32mf2x5_t |
vuint32mf2x6_t |
vuint32mf2x7_t |
vuint32mf2x8_t |
vfloat16mf2_t |
vfloat16mf2x2_t |
vfloat16mf2x3_t |
vfloat16mf2x4_t |
vfloat16mf2x5_t |
vfloat16mf2x6_t |
vfloat16mf2x7_t |
vfloat16mf2x8_t |
vfloat32mf2_t |
vfloat32mf2x2_t |
vfloat32mf2x3_t |
vfloat32mf2x4_t |
vfloat32mf2x5_t |
vfloat32mf2x6_t |
vfloat32mf2x7_t |
vfloat32mf2x8_t |
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8m1_t |
vint8m1x2_t |
vint8m1x3_t |
vint8m1x4_t |
vint8m1x5_t |
vint8m1x6_t |
vint8m1x7_t |
vint8m1x8_t |
vuint8m1_t |
vuint8m1x2_t |
vuint8m1x3_t |
vuint8m1x4_t |
vuint8m1x5_t |
vuint8m1x6_t |
vuint8m1x7_t |
vuint8m1x8_t |
vint16m1_t |
vint16m1x2_t |
vint16m1x3_t |
vint16m1x4_t |
vint16m1x5_t |
vint16m1x6_t |
vint16m1x7_t |
vint16m1x8_t |
vuint16m1_t |
vuint16m1x2_t |
vuint16m1x3_t |
vuint16m1x4_t |
vuint16m1x5_t |
vuint16m1x6_t |
vuint16m1x7_t |
vuint16m1x8_t |
vint32m1_t |
vint32m1x2_t |
vint32m1x3_t |
vint32m1x4_t |
vint32m1x5_t |
vint32m1x6_t |
vint32m1x7_t |
vint32m1x8_t |
vuint32m1_t |
vuint32m1x2_t |
vuint32m1x3_t |
vuint32m1x4_t |
vuint32m1x5_t |
vuint32m1x6_t |
vuint32m1x7_t |
vuint32m1x8_t |
vint64m1_t |
vint64m1x2_t |
vint64m1x3_t |
vint64m1x4_t |
vint64m1x5_t |
vint64m1x6_t |
vint64m1x7_t |
vint64m1x8_t |
vuint64m1_t |
vuint64m1x2_t |
vuint64m1x3_t |
vuint64m1x4_t |
vuint64m1x5_t |
vuint64m1x6_t |
vuint64m1x7_t |
vuint64m1x8_t |
vfloat16m1_t |
vfloat16m1x2_t |
vfloat16m1x3_t |
vfloat16m1x4_t |
vfloat16m1x5_t |
vfloat16m1x6_t |
vfloat16m1x7_t |
vfloat16m1x8_t |
vfloat32m1_t |
vfloat32m1x2_t |
vfloat32m1x3_t |
vfloat32m1x4_t |
vfloat32m1x5_t |
vfloat32m1x6_t |
vfloat32m1x7_t |
vfloat32m1x8_t |
vfloat64m1_t |
vfloat64m1x2_t |
vfloat64m1x3_t |
vfloat64m1x4_t |
vfloat64m1x5_t |
vfloat64m1x6_t |
vfloat64m1x7_t |
vfloat64m1x8_t |
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8m2_t |
vint8m2x2_t |
vint8m2x3_t |
vint8m2x4_t |
N/A |
N/A |
N/A |
N/A |
vuint8m2_t |
vuint8m2x2_t |
vuint8m2x3_t |
vuint8m2x4_t |
N/A |
N/A |
N/A |
N/A |
vint16m2_t |
vint16m2x2_t |
vint16m2x3_t |
vint16m2x4_t |
N/A |
N/A |
N/A |
N/A |
vuint16m2_t |
vuint16m2x2_t |
vuint16m2x3_t |
vuint16m2x4_t |
N/A |
N/A |
N/A |
N/A |
vint32m2_t |
vint32m2x2_t |
vint32m2x3_t |
vint32m2x4_t |
N/A |
N/A |
N/A |
N/A |
vuint32m2_t |
vuint32m2x2_t |
vuint32m2x3_t |
vuint32m2x4_t |
N/A |
N/A |
N/A |
N/A |
vint64m2_t |
vint64m2x2_t |
vint64m2x3_t |
vint64m2x4_t |
N/A |
N/A |
N/A |
N/A |
vuint64m2_t |
vuint64m2x2_t |
vuint64m2x3_t |
vuint64m2x4_t |
N/A |
N/A |
N/A |
N/A |
vfloat16m2_t |
vfloat16m2x2_t |
vfloat16m2x3_t |
vfloat16m2x4_t |
N/A |
N/A |
N/A |
N/A |
vfloat32m2_t |
vfloat32m2x2_t |
vfloat32m2x3_t |
vfloat32m2x4_t |
N/A |
N/A |
N/A |
N/A |
vfloat64m2_t |
vfloat64m2x2_t |
vfloat64m2x3_t |
vfloat64m2x4_t |
N/A |
N/A |
N/A |
N/A |
Non-tuple Types (NFILED=1) | NFIELD=2 | NFIELD=3 | NFIELD=4 | NFIELD=5 | NFIELD=6 | NFIELD=7 | NFIELD=8 |
---|---|---|---|---|---|---|---|
vint8m4_t |
vint8m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vuint8m4_t |
vuint8m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vint16m4_t |
vint16m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vuint16m4_t |
vuint16m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vint32m4_t |
vint32m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vuint32m4_t |
vuint32m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vint64m4_t |
vint64m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vuint64m4_t |
vuint64m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vfloat16m4_t |
vfloat16m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vfloat32m4_t |
vfloat32m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
vfloat64m4_t |
vfloat64m4x2_t |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
The intrinsics provide additional utility functions to assist users in manipulating across intrinsic types. These functions are called "pseudo intrinsics". These pseudo intrinsics do not represent any real instructions.
The vsetvl
intrinsics return the number of elements processed in a stripmining loop when provided with the element width and LMUL in the intrinsic suffix.
Note
|
The implementation must respect the ratio between SEW and LMUL given to the intrinsic. On the other hand, as mentioned in Control of number of elements to be processed, the vsetvl intrinsics do not necessarily map to the emission a vsetvl instruction of that exact SEW and LMUL provided. The actual value written to the vl control register is an implementation defined behavior and typically not known until runtime.
|
The vsetvlmax
intrinsics return VLMAX
5 when provided with the element width and LMUL in the intrinsic suffix.
Note
|
As mentioned in Control of number of elements to be processed, the vsetvlmax intrinsics do not necessarily map to the emission a vsetvl instruction of that exact SEW and LMUL provided. The actual value written to the vl control register is an implementation defined behavior and typically not known until runtime.
|
The vreinterpret
intrinsics are provided for users to transition across the strongly-typed scheme. The intrinsic is limited to conversion between types operating upon the same number of registers.
The intrinsics do not alter values held within. Please use vfcvt/v(f)wcvt/v(f)ncvt
intrinsics if you seek to extend, narrow, or perform real float/interger type conversions for the values.
The vundefined
intrinsics are placeholders to represent unspecified values in variable initialization, or as arguments of vset
and vcreate
.
The vget
intrinsics allow users to obtain small LMUL values from larger LMUL ones. The vget
intrinsics also allows users to extract non-tuple (NFIELD=1
) types from tuple (NFIELD>1
) types after segment load intrinsics. The index provided must be a constant known at compile time.
The intrinsics do not map to any real instruction. Whether if the implementation will generate vector move instructions is an optimization issue.
The vset
intrinsics allow users to combine small LMUL values into larger LMUL ones. The vset
intrinsics also allows users to combine non-tuple (NFIELD=1
) types to tuple (NFIELD>1
) types for segment store intrinsics. The index provided must be a constant known at compile time.
The intrinsics do not map to any real instruction. Whether if the implementation will generate vector move instructions is an optimization issue.
The vlmul_trunc
intrinsics are syntactic sugars that have the same semantic as vget
with idx=0
.
The vlmul_ext
intrinsics are syntactic sugars that have the same semantic as vset
with idx=0
.
The vcreate
intrinsics are syntactic sugars for RVV types creation. They have the same semantic as multiple vset
-s filling in values accordingly.
An agnostic value is in the granularity of "element" contained in an RVV type. Agnostic values are either:
-
Tail element(s) of an RVV value produced through a
ta
instruction -
Masked-off element(s) of an RVV value produced through a
ma
instruction -
All elements in an uninitialized value
-
A value assigned with the
vundefined
intrinsics
An agnostic value is an indeterminate value and evaluation of an agnostic value is undefined behavior. Users should not rely on any evaluation to an agnostic value.
There is no intrinsic that directly maps to the whole vector register move instructions (vmv<nr>r.v
) 30.
For copying of the vector contents in whole, we encourage the users to use the assignment operator (=
).
The assignment operator (=
) represents the semantic of a whole vector register (group) copy for the expression on the right hand side to the RVV type object on the left hand side. The semantic will still maintain a whole vector register content copy for fractional LMUL types. This enables the compiler to coalesce register usage when possible.
Users may leverage the vector move intrinsics (vmv_v_v
) intrinsics if they hope to copy vector register groups with vl != VLMAX
.
Intrinsics whose computation is relevant to the value held in destination register (assembly mnemonics vd
) have a vd
argument in them. The following list enumerates the intrinsics that have a vd
argument. Please see the appendix for the exact prototypes of these intrinsics.
-
Intrinsics with tail-undisturbed (
vta=0
) -
Intrinsics with mask-undisturbed (
vma=0
) -
Intrinsics representing Vector Multiply-Add Operations 13
-
Intrinsics representing Vector Slideup Instructions 24
For intrinsics with no vd
argument, the implementation is free to pick any register as the destination register.
The vstart
CSR is currently not exposed to the intrinsics programmer, and the intrinsics have the semantics of vstart = 0
. Support for positive vstart
values is implementation -defined; thus, portable application software should not set vstart > 0
.
Some users may expect the intrinsics to directly translate and appear in the assembly; however, the intrinsics are the interfaces that expose the vector instruction semantics. The implementation is free to optimize them out if there is an opportunity.
Control of vl
, vtype
, vxrm
, and frm
is not directly exposed to the user. The implementation is responsible for setting the correct values into these CSRs to achieve the expected semantics of the intrinsic functions with respect to the conventions defined in the ISA specification 0 and ABI specification 9.
The specification mentions 15 that the strided load/store instruction with a stride of 0 could have different behaviors, performing all memory accesses or fewer memory operations. Since needing all memory accesses isn’t likely to be common, the implementation is allowed to generate fewer memory operations with strided load/store intrinsics.
In other words, the compiler does not guarantee generating the instruction for all memory accesses in strided load/store intrinsics with a stride of 0. If the user needs all memory accesses to be performed, they should use an indexed load/store intrinsics with all zero indices.
The intrinsics provide variants with operand mnemonics of vv
and vx
, but not vi
. This was an intentional design to reduce the total amount of out-going intrinsics.
It is an optimization issue for the implementation to emit instructions with operand mnemonics of vi
when an immediate that can be expressed within 5-bit is provided to the intrinsics.
The compiler will be conservative to registers (vtype
, vxrm
, frm
) when encountering inline assembly. Users should be aware that mixing uses of intrinsics and inline assembly will result in extra save and restore.
Note
|
Standard extensions are merged into riscv/riscv-isa-manual after ratification. There is an on-going pull request 26 for the "V" extension to be merged. At this moment this intrinsics specification still references the frozen draft 0. This reference will be updated in the future once the pull request has been merged.
|
4Section 3.4.1 (Vector selected element width vsew[2:0]
) in the specification 0
5Section 3.4.2 (Vector Register Grouping (vlmul[2:0]`
)) in the specification 0
6Section 3.4.3 (Vector Tail Agnostic and Vector Mask Agnostic vta
and vma
) in the specification 0
7Section 5.3 (Vector Masking) in the specification 0
8Section 3.8 (Vector Fixed-Point Rounding Mode Register vxrm
) in the specification 0
11Section 3.5 (Vector Length Register) in the specification 0
12Section 3.4.2 in the specification 0
13Section 11.13, 11.14, 13.6, 13.7 in the specification 0
14Section 4.5 (Mask Register Layout) in the specification 0
15Section 7.5 in the specification 0
16Section 7.8 in the specification 0
17Section 5.2 (Vector Operands) in the specification 0
18Section 6 (Configuration-Setting Instructions) in the specification 0
19Section 18 (Standrad Vector Extensions) in the specification 0
20Section 18.2 (Zve*: Vector Extensions for Embedded Processors) in the specification 0
21Section 12 (Vector Fixed-Point Arithmetic Instructions) in the specification 0
22Section 3.9 (3.9. Vector Fixed-Point Saturation Flag vxsat) in the specification 0
23Section 13 (Vector Floating-Point Instructions) in the specification 0
24Section 16.3.1 (Vector Slideup Instructions) in the specification 0
25Section 3.7 (Vector Start Index CSR vstart
) in the specification 0
27Section 6.3 (Constraints on Setting vl
) in the specficiation 0
28Section 6.4 (Example of stripmining and changes to SEW) in the specification 0
29Section 3.6 (Vector Byte Length vlenb
) in the specification 0
30Section 16.6 (Whole Vector Register Move) in the specification 0