Introduction

The RISC-V vector C intrinsics provide users interfaces in the C language level to directly leverage the RISC-V "V" extension ⁰ (also abbreviated as "RVV"), with assistance from the compiler in handling instruction scheduling and register allocation. The intrinsics also aim to free users from responsibility of maintaining the correct configuration settings ¹⁸ for the vector instruction executions.

This document uses the term "RVV" as an abbreviation for the RISC-V "V" extension. This document uses the term "the specification" to indicate the RISC-V "V" extension specification.

Test macro

The __riscv_v_intrinsic macro is the C macro to test the compiler’s support for the RISC-V "V" extension intrinsics.

The value of the test macro is defined as its version, which is computed using the following formula. The formula is identical to what is defined in the RISC-V C API specification ¹ .

<MAJOR_VERSION> * 1,000,000 + <MINOR_VERSION> * 1,000 + <REVISION_VERSION>

For example, the v1.0 version should define the macro with value 1000000.

Header file inclusion

To leverage the intrinsics in the toolchain, the header <riscv_vector.h> needs to be included. We suggest guarding the inclusion with the test macro.

#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif /* __riscv_v_intrinsic */

Availability

With <riscv_vector.h> included, availability of intrinsic variants depends on the required architecture of their corresponding vector instructions. The supported architecture is specified to the compiler using the -march option ^2,3.

The standard vector extensions ¹⁹ provides a set of smaller extensions for embedded use. Please check out the Zve extensions ²⁰ for the varying degree of support.

For example, RVV type vint64m1_t and __riscv_vle64_v_i64m1 are not available under architecture rv64gc_zve32x.

Control of the vector extension programming model

The intrinsics allow users to control the fields in vtype, as well as the rounding modes for fixed-point (vxrm) and floating-point (frm) vector computations.

In this chapter, we cover how the intrinsics embed the control of vtype fields in the function names. Please see Naming scheme for the rules described in this chapter.

Control of effective element width (EEW) and effective LMUL (EMUL)

The RISC-V vector intrinsics' data types are strongly-typed. The vector intrinsics encode the EEW (effective-element-width) ¹⁷ and EMUL (effective LMUL) ¹⁷ of the destination vector register in the suffix of the function name. Users can expect the results of the vector instruction intrinsics are computed under the specified EEW and EMUL.

To see the full list of data types for the intrinsics, please see Type system.

In the following example, the intrinsic will produce the semantic of a vadd.vv instruction, with source vector operands of EEW=32 and EMUL=1, which can be observed through provided RVV data type; and produce results for the destination vector operand with EEW=32 and EMUL=1.

vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);

In the following example, the intrinsic will produce the semantic of a vadd.vv instruction, with source vector operands of EEW=16 and EMUL=1/2, which can be observed through provided RVV data type; and produce results for the destination vector operand with EEW=32 and EMUL=1.

vint32m1_t __riscv_vwadd_vv_i32m1(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);

Control of number of elements to be processed

The intrinsics do not directly expose the vector length control register (assembly mnemonics vl ¹¹) to the intrinsics programmer. The intrinsics programmer specifies an "application vector length (AVL)" ¹⁸ using the argument size_t vl. The implementation is responsible to set the correct value into the underlying vector length control register (vl) given the informed AVL.

Note	The intrinsics for instructions that behave the same with different `vl` settings (e.g. `vmv.s.x`) do not have a `size_t vl` argument.

Note

The actual value written to the vl control register is an implementation defined behavior and is typically not known until runtime. The actual setting of vl, given the provided AVL through the parameter, follows the rules ²⁷ in the specification. The number of elements processed can be obtained through the _riscv_vsetvl* intrinsics vsetvl.

Control of vector masking

Instructions that are available for masking ⁷ have masked variant intrinsics.

The intrinsics fuse the control of vector masking (vm) together with the control for policy behavior (vta, vma) in the same suffix. Please checkout Control of behavior of destination tail elements and destination inactive masked-off elements and Policy and masked naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.

Control of behavior of destination tail elements and destination inactive masked-off elements

The behavior of destination tail elements and destination inactive masked-off elements is controlled by the vta and vma bits ⁶.

Given the general assumption that target audience of the intrinsics are high performance cores, and an "undisturbed" policy will generally slow down an out-of-order core, the intrinsics have a default policy scheme of tail-agnostic and mask-agnostic (that is, vta=1 and vma=1).

The intrinsics fuse the control of vector masking (vm) together with the control for policy behavior (vta, vma) in the same suffix. Please checkout Control of vector masking and Policy and masked naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.

Control of fixed-point rounding mode

For the fixed-point intrinsics, representing the fixed-point arithmetic instructions ²¹, the vxrm argument of the intrinsics indicates the rounding mode (vxrm) ⁸ control.

The vxrm argument is required to be a constant integer expression. The implementation should provide the following enum that maps to the defined rounding mode values under Table 4 ⁸ of the specification.

enum __RISCV_VXRM {
  __RISCV_VXRM_RNU = 0,
  __RISCV_VXRM_RNE = 1,
  __RISCV_VXRM_RDN = 2,
  __RISCV_VXRM_ROD = 3,
};

Note	Rounding mode does not affect the computations of `vsadd`, `vsaddu`, `vssub`, and `vssubu`; therefore, the intrinsics for these instructions do not have the `vxrm` argument.

Note	The RISC-V psABI ⁹ states that `vxrm` is not preserved across calls. Optimization for reducing the number of redundant writes to `vxrm` is a compiler and system specific issue.

Note	Control of the vector fixed-point saturation flag (`vxsat`) ²² is not yet covered in the vector intrinsics v1.0. We plan to support it in follow-up versions in a compatible way with existing intrinsics in v1.0.

Control of floating-point rounding mode

For the floating-point intrinsics, representing the floating-point arithmetic instructions ²³, the intrinsics have two variants, called the implicit-frm and the explicit-frm intrinsics.

Note	Control of the floating-point accrued exceptions flag fields (`fflag`) ¹⁰ is not yet covered in the vector intrinsics v1.0. We plan to support it in follow-up versions in a compatible way with existing intrinsics in v1.0.

Implicit-`frm` intrinsics

The implicit-frm intrinsics behave like any C-language floating-point expressions, using the default rounding mode when FENV_ACCESS is off, and using the fenv dynamic rounding mode when FENV_ACCESS is on.

Note	Both GNU and LLVM compilers generate scalar floating-point instructions using dynamic rounding mode, relying on the environment initialization to set `frm` to `RNE` (specified as "roundTiesToEven" in IEEE-754 (a.k.a. IEC 60559)).

Note

The implicit-frm intrinsics are intended to be used regardless of FENV_ACCESS. They are provided when FENV_ACCESS is on for the (few) programmers who are already using fenv; and they are provided when FENV_ACCESS is off for the (vast majority of) programmers who prefer the default rounding mode.

Explicit-`frm` intrinsics

The explicit-frm intrinsics contain the frm argument which indicates the rounding mode (frm) ¹⁰ control. The floating-point intrinsics with the frm argument are followed by an _rm suffix in the function name.

The frm argument is required to be a constant integer expression. The implementation should provide the following enum that maps to the defined rounding mode values under RISC-V ISA Manual Table 8.1 ⁹.

enum __RISCV_FRM {
  __RISCV_FRM_RNE = 0,
  __RISCV_FRM_RTZ = 1,
  __RISCV_FRM_RDN = 2,
  __RISCV_FRM_RUP = 3,
  __RISCV_FRM_RMM = 4,
};

Note

The explicit-frm intrinsics are intended to be used when FENV_ACCESS is off, enabling more aggressive optimization while still providing the programmer with control over the rounding mode. Using explicit-frm intrinsics when FENV_ACCESS is on will still work correctly, but is expected to lead to extra saving/restoring of frm, that could be avoided by using fenv functionality and implicit-frm intrinsics.

Naming scheme

The naming scheme of the intrinsics expresses the users' control of fields in vtype, vl, and rounding modes for fixed-point and floating-point vector computations. For details of these CSR controls, please see [control-of-vector-programming-mode].

As mentioned in Control of vector masking and Control of behavior of destination tail elements and destination inactive masked-off elements, the intrinsics fuses the control of vm, vta, and vma into the same suffix. Policy and masked naming scheme enumerates the exact suffixes. You may find where these suffixes are appended in Explicit (Non-overloaded) naming scheme.

The intrinsics can be split into two major types, called "explicit (non-overloaded) intrinsics" and "implicit (overloaded) intrinsics".

The explicit (non-overloaded) intrinsics embed the control described in Control of the vector extension programming model in the function name. This scheme gives intrinsic codebase more readability as the execution states are explicitly specified in the code.

The implicit (overloaded) intrinsics, on the contrary, omit the explicit specifications for vtype control. The implicit (overloaded) intrinsics aim to provide a generic interface to let users put values of different EEW ¹⁷ and EMUL ¹⁷ as the input argument.

This section covers the general naming rule of the two types of intrinsics accordingly. Then, this section also enumerates the exceptions and the rationales behind them in Exceptions in the explicit (non-overloaded) naming scheme and Exceptions in the implicit (overloaded) naming scheme.

Policy and masked naming scheme

With the default policy scheme mentioned under Control of behavior of destination tail elements and destination inactive masked-off elements, each intrinsic provides corresponding variants for their available control of vm, vta and vma. The following list enumerates the possible suffixes.

No suffix: Represents an unmasked (vm=1) vector operation with tail-agnostic (vta=1)
_tu suffix: Represents an unmasked (vm=1) vector operation with tail-undisturbed (vta=0) policy
_m suffix: Represents a masked (vm=0) vector operation with tail-agnostic (vta=1), mask-agnostic (vma=1) policy
_tum suffix: Represents a masked (vm=0) vector operation with tail-undisturbed (vta=0), mask-agnostic (vma=1) policy
_mu suffix: Represents a masked (vm=0) vector operation with tail-agnostic (vta=1), mask-undisturbed (vma=0) policy
_tumu suffix: Represents a masked (vm=0) vector operation with tail-undisturbed (vta=0), mask-undisturbed (vma=0) policy

Using vadd with EEW=32 and EMUL=1 as an example, the variants are:

// vm=1, vta=1
vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
// vm=1, vta=0
vint32m1_t __riscv_vadd_vv_i32m1_tu(vint32m1_t vd, vint32m1_t vs2,
                                    vint32m1_t vs1, size_t vl);
// vm=0, vta=1, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_m(vbool32_t vm, vint32m1_t vs2, vint32m1_t vs1,
                                   size_t vl);
// vm=0, vta=0, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_tum(vbool32_t vm, vint32m1_t vd,
                                     vint32m1_t vs2, vint32m1_t vs1, size_t vl);
// vm=0, vta=1, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_mu(vbool32_t vm, vint32m1_t vd, vint32m1_t vs2,
                                    vint32m1_t vs1, size_t vl);
// vm=0, vta=0, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_tumu(vbool32_t vm, vint32m1_t vd,
                                      vint32m1_t vs2, vint32m1_t vs1,
                                      size_t vl);

Note	When policy is set to "agnostic", there is no guarantee of what will be in the tail/masked-off elements. Under this policy, users should not assume the values within to be deterministic.

Note	Pseudo intrinsics mentioned under Pseudo intrinsics do not map to real vector instructions. Therefore these intrinsics are not affected by the policy setting, nor do they have intrinsic variants of the suffixes listed above.

Explicit (Non-overloaded) naming scheme

In general, the intrinsics are encoded as the following. The intrinsics under this naming scheme are the "non-overloaded intrinsics", which in parallel we have the "overloaded intrinsics" defined under Implicit (Overloaded) naming scheme.

The naming rules are as follows.

__riscv_{V_INSTRUCTION_MNEMONIC}_{OPERAND_MNEMONIC}_{RETURN_TYPE}_{ROUND_MODE}_{POLICY}{(...)

OPERAND_MNEMONIC are like v, vv, vx, vs, vvm, vxm
RETURN_TYPE depends on whether the return type of the vector instruction is a mask register…
- For intrinsics that represents instructions with a non-mask destination register:
  - EEW is one of i8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64.
  - EMUL is one of m1 | m2 | m4 | m8 | mf2 | mf4 | mf8.
  - Type system explains the limited enumeration of EEW-EMUL pairs.
- For intrinsics that represent intrinsics with a mask destination register:
  - RETURN_TYPE is one of b1 | b2 | b4 | b8 | b16 | b32 | b64, which is derived from the ratio EEW/EMUL.
V_INSTRUCTION_MNEMONIC are like vadd, vfmacc, vsadd.
ROUND_MODE is the _rm suffix mentioned in Explicit-frm intrinsics. Other intrinsics do not have this suffix.
POLICY are enumerated under Policy and masked naming scheme.

The general naming scheme is not sufficient to express all intrinsics. The exceptions are enumerated in the proceeding section Exceptions in the explicit (non-overloaded) naming scheme.

Exceptions in the explicit (non-overloaded) naming scheme

This section enumerates the exceptions in the explicit (non-overloaded) naming scheme.

Scalar move instructions

Only encoding the return type will cause naming collisions for the permutation instruction intrinsics. The intrinsics encode the input vector type and the output scalar type in the suffix.

int8_t __riscv_vmv_x_s_i8m1_i8 (vint8m1_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m2_i8 (vint8m2_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m4_i8 (vint8m4_t vs2, size_t vl);
int8_t __riscv_vmv_x_s_i8m8_i8 (vint8m8_t vs2, size_t vl);

Reduction instructions

Only encoding the return type will cause naming collisions for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output vector type in the suffix.

vint8m1_t __riscv_vredsum_vs_i8m1_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m2_i8m1(vint8m2_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m4_i8m1(vint8m4_t vs2, vint8m1_t vs1, size_t vl);
vint8m1_t __riscv_vredsum_vs_i8m8_i8m1(vint8m8_t vs2, vint8m1_t vs1, size_t vl);

Add-with-carry / Subtract-with-borrow instructions

Only encoding the return type will cause naming collisions for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output mask vector type in the suffix.

vbool64_t __riscv_vmadc_vvm_i8mf8_b64(vint8mf8_t vs2, vint8mf8_t vs1,
                                      vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i16mf4_b64(vint16mf4_t vs2, vint16mf4_t vs1,
                                       vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i32mf2_b64(vint32mf2_t vs2, vint32mf2_t vs1,
                                       vbool64_t v0, size_t vl);
vbool64_t __riscv_vmadc_vvm_i64m1_b64(vint64m1_t vs2, vint64m1_t vs1,
                                      vbool64_t v0, size_t vl);

`vreinterpret`, `vlmul_trunc`/`vlmul_ext`, and `vset`/`vget`

Only encoding the return type will cause naming collisions for these pseudo instructions. The intrinsics encode the input vector type before the return type in the suffix.

The following shows an example with __riscv_vreinterpret_v of vint32m1_t input vector type.

vfloat32m1_t __riscv_vreinterpret_v_i32m1_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_v_i32m1_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_v_i32m1_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_v_i32m1_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_v_i32m1_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_v_i32m1_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_v_i32m1_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_v_i32m1_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_v_i32m1_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_v_i32m1_b4 (vint32m1_t src);

Implicit (Overloaded) naming scheme

The implicit (overloaded) interface aims to provide a generic interface that takes values of different EEW and EMUL as the input. Therefore, the implicit intrinsics omit the EEW and EMUL encoded in the function name. The _rm prefix for explicit-frm intrinsics (Control of floating-point rounding mode) is also omitted. The intrinsics under this scheme are the "overloaded intrinsics", which in parallel we have the "non-overloaded intrinsics" defined under Explicit (Non-overloaded) naming scheme.

Take the vector addition (vadd) instruction intrinsics as an example, stripping off the operand mnemonics and encoded EEW, EMUL information, the intrinsics provides the following overloaded interfaces.

vint32m1_t __riscv_vadd(vint32m1_t v0, vint32m1_t v1, size_t vl);
vint16m4_t __riscv_vadd(vint16m4_t v0, vint16m4_t v1, size_t vl);

Since the main intent is to let the users put different value(s) of EEW and EMUL as input argument(s), the overloaded intrinsics do not omit the policy suffix. That is, the suffix listed under Control of behavior of destination tail elements and destination inactive masked-off elements is not omitted and is still encoded in the function name.

The masked variants with the default policy shares the same interface with the unmasked variants with the default policy. They do not have any trailing suffixes.

Take the vector floating-point add (vfadd) as an example, the intrinsics provides the following overloaded interfaces.

vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd(vbool32_t vm, vfloat32m1_t vs2, vfloat32m1_t vs1,
                           size_t vl);
vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned int frm,
                           size_t vl);
vfloat32m1_t __riscv_vfadd(vbool32_t vm, vfloat32m1_t vs2, vfloat32m1_t vs1,
                           unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t vd, vfloat32m1_t vs2,
                              vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                               vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                                vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                              vfloat32m1_t vs1, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t vd, vfloat32m1_t vs2,
                              vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                               vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                                vfloat32m1_t vs1, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t vm, vfloat32m1_t vd, vfloat32m1_t vs2,
                              vfloat32m1_t vs1, unsigned int frm, size_t vl);

The naming scheme to prune everything except the instruction mnemonics is not available for all the intrinsics. Please see Exceptions in the implicit (overloaded) naming scheme for overloaded intrinsics with irregular naming patterns.

Due to the limitations of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Therefore these intrinsics do not possess a simplified, EEW/EMUL-omitted interface. Please see Un-supported intrinsics for implicit (overloaded) naming scheme for more detail.

Exceptions in the implicit (overloaded) naming scheme

The following intrinsics have an irregular naming pattern.

Widening instructions

Widening instruction intrinsics (e.g. vwadd) have the same return type but different types of arguments. The operand mnemonics are encoded into their overloaded versions to help distinguish them.

vint32m1_t __riscv_vwadd_vv(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);
vint32m1_t __riscv_vwadd_vx(vint16mf2_t vs2, int16_t rs1, size_t vl);
vint32m1_t __riscv_vwadd_wv(vint32m1_t vs2, vint16mf2_t vs1, size_t vl);
vint32m1_t __riscv_vwadd_wx(vint32m1_t vs2, int16_t rs1, size_t vl);

Type-convert instructions

Type-convert instruction intrinsics (e.g. vfcvt.x.f, vfcvt.xu.f, vfcvt.rtz.xu.f) encode the returning type mnemonics into their overloaded variants to help distinguish them.

The following shows how _x, _rtz_x, _xu, and _rtz_xu are appended to the suffix for distinction.

vint32m1_t __riscv_vfcvt_x (vfloat32m1_t src, size_t vl);
vint32m1_t __riscv_vfcvt_rtz_x (vfloat32m1_t src, size_t vl);
vuint32m1_t __riscv_vfcvt_xu (vfloat32m1_t src, size_t vl);
vuint32m1_t __riscv_vfcvt_rtz_xu (vfloat32m1_t src, size_t vl);

`vreinterpret`, LMUL truncate/extension, and `vset`/`vget`

These pseudo intrinsics encode the return type (e.g. __riscv_vreinterpret_b8) into their overloaded variants to help distinguish them.

The following shows how the return type is appended to the suffix for distinction.

vfloat32m1_t __riscv_vreinterpret_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_b4 (vint32m1_t src);

Un-supported intrinsics for implicit (overloaded) naming scheme

Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Intrinsics with characteristics of either of the following do not possess an overloaded version.

Intrinsics with input arguments that are all scalar types and scalar types alone (e.g. unmasked vector load instruction intrinsics, vmv.s.x)
Intrinsics with vl as the only argument (e.g. vmclr, vmset, vid)
Intrinsics with vector boolean input(s), returning a vector non-boolean vector type (e.g. viota)

Type system

The intrinsics are designed to be strongly-typed. The intrinsics provide vreinterpret intrinsics to help users go across the strongly-typed scheme if necessary.

Non-mask (integer and floating-point) data types have SEW and LMUL encoded.

Integer types

Integer types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.

Types with subscript "^*" are available when ELEN >= 64 (that is, unavailable under Zve32* and require at least Zve64x).

Table 1. Integer types

Types	EMUL=1/8	EMUL=1/4	EMUL=1/ 2	EMUL=1	EMUL=2	EMUL=4	EMUL=8
int8_t	vint8mf8_t^*	vint8mf4_t	vint8mf2_t	vint8m1_t	vint8m2_t	vint8m4_t	vint8m8_t
int16_t	N/A	vint16mf4_t^*	vint16mf2_t	vint16m1_t	vint16m2_t	vint16m4_t	vint16m8_t
int32_t	N/A	N/A	vint32mf2_t^*	vint32m1_t	vint32m2_t	vint32m4_t	vint32m8_t
int64_t	N/A	N/A	N/A	vint64m1_t^*	vint64m2_t^*	vint64m4_t^*	vint64m8_t^*
uint8_t	vuint8mf8_t^*	vuint8mf4_t	vuint8mf2_t	vuint8m1_t	vuint8m2_t	vuint8m4_t	vuint8m8_t
uint16_t	N/A	vuint16mf4_t^*	vuint16mf2_t	vuint16m1_t	vuint16m2_t	vuint16m4_t	vuint16m8_t
uint32_t	N/A	N/A	vuint32mf2_t^*	vuint32m1_t	vuint32m2_t	vuint32m4_t	vuint32m8_t
uint64_t	N/A	N/A	N/A	vuint64m1_t^*	vuint64m2_t^*	vuint64m4_t^*	vuint64m8_t^*

Floating-point types

Floating-point types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.

Floating-point types with element widths of 16 (Types=_Float16) require the zvfh and zvfhmin extension to be specified in the architecture.

Floating-point types with element widths of 32 (Types=float) require the zve32f extension to be specified in the architecture.

Floating-point types with element widths of 64 (Types=double) require the zve64d extension to be specified in the architecture.

Note

Although C++23 introduces <stdfloat> for fixed-width floating-point types, this latest standard is not yet supported in the upstream RISC-V compiler. The specification (along with the prototype lists in appendix) uses Float16/float/double to represent the floating-point type with element width of 16/32/64.

Types with subscript "^\*" are available when ELEN >= 64 (that is, unavailable under Zve32f and require at least Zve64f).

Table 2. Floating-point types

Types	EMUL=1/8	EMUL=1/4	EMUL=1/ 2	EMUL=1	EMUL=2	EMUL=4	EMUL=8
_Float16	N/A	vfloat16mf4_t^*	vfloat16mf2_t	vfloat16m1_t	vfloat16m2_t	vfloat16m4_t	vfloat16m8_t
float	N/A	N/A	vfloat32mf2_t^*	vfloat32m1_t	vfloat32m2_t	vfloat32m4_t	vfloat32m8_t
double	N/A	N/A	N/A	vfloat64m1_t	vfloat64m2_t	vfloat64m4_t	vfloat64m8_t

Mask types

Mask types have the ratio that is derived from EEW/EMUL encoded into the type. The mask types represent mask register values that follows the Mask Register Layout ¹⁴.

Types with subscript "^\*" are available when ELEN >= 64 (that is, unavailable under Zve32x and require at least Zve64x).

Table 3. Mask types

Types	n = 1	n = 2	n = 4	n = 8	n = 16	n = 32	n = 64
bool	vbool1_t	vbool2_t	vbool4_t	vbool8_t	vbool16_t	vbool32_t	vbool64_t^*

Tuple type

Tuple types encode SEW, LMUL, and NFIELD¹⁶ into the data type.

These types are utilized through the segment load/store instruction intrinsics along with getters vget and setters vset to extract/combine them. The types listed in Integer types and Floating-point types all have tuple types. Types under the combination of LMUL, NFIELD follows the restriction by the specification, EMUL * NFIELDS ≤ 8.

Availability of the tuple types follows the availability of their corresponding non-tuple (NFIELD=1) types.

Table 4. Tuple types (EMUL=1/8)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8mf8_t	vint8mf8x2_t	vint8mf8x3_t	vint8mf8x4_t	vint8mf8x5_t	vint8mf8x6_t	vint8mf8x7_t	vint8mf8x8_t
vuint8mf8_t	vuint8mf8x2_t	vuint8mf8x3_t	vuint8mf8x4_t	vuint8mf8x5_t	vuint8mf8x6_t	vuint8mf8x7_t	vuint8mf8x8_t

Table 5. Tuple types (EMUL=1/4)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8mf4_t	vint8mf4x2_t	vint8mf4x3_t	vint8mf4x4_t	vint8mf4x5_t	vint8mf4x6_t	vint8mf4x7_t	vint8mf4x8_t
vuint8mf4_t	vuint8mf4x2_t	vuint8mf4x3_t	vuint8mf4x4_t	vuint8mf4x5_t	vuint8mf4x6_t	vuint8mf4x7_t	vuint8mf4x8_t
vint16mf4_t	vint16mf4x2_t	vint16mf4x3_t	vint16mf4x4_t	vint16mf4x5_t	vint16mf4x6_t	vint16mf4x7_t	vint16mf4x8_t
vuint16mf4_t	vuint16mf4x2_t	vuint16mf4x3_t	vuint16mf4x4_t	vuint16mf4x5_t	vuint16mf4x6_t	vuint16mf4x7_t	vuint16mf4x8_t
vfloat16mf4_t	vfloat16mf4x2_t	vfloat16mf4x3_t	vfloat16mf4x4_t	vfloat16mf4x5_t	vfloat16mf4x6_t	vfloat16mf4x7_t	vfloat16mf4x8_t

Table 6. Tuple types (EMUL=1/2)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8mf2_t	vint8mf2x2_t	vint8mf2x3_t	vint8mf2x4_t	vint8mf2x5_t	vint8mf2x6_t	vint8mf2x7_t	vint8mf2x8_t
vuint8mf2_t	vuint8mf2x2_t	vuint8mf2x3_t	vuint8mf2x4_t	vuint8mf2x5_t	vuint8mf2x6_t	vuint8mf2x7_t	vuint8mf2x8_t
vint16mf2_t	vint16mf2x2_t	vint16mf2x3_t	vint16mf2x4_t	vint16mf2x5_t	vint16mf2x6_t	vint16mf2x7_t	vint16mf2x8_t
vuint16mf2_t	vuint16mf2x2_t	vuint16mf2x3_t	vuint16mf2x4_t	vuint16mf2x5_t	vuint16mf2x6_t	vuint16mf2x7_t	vuint16mf2x8_t
vint32mf2_t	vint32mf2x2_t	vint32mf2x3_t	vint32mf2x4_t	vint32mf2x5_t	vint32mf2x6_t	vint32mf2x7_t	vint32mf2x8_t
vuint32mf2_t	vuint32mf2x2_t	vuint32mf2x3_t	vuint32mf2x4_t	vuint32mf2x5_t	vuint32mf2x6_t	vuint32mf2x7_t	vuint32mf2x8_t
vfloat16mf2_t	vfloat16mf2x2_t	vfloat16mf2x3_t	vfloat16mf2x4_t	vfloat16mf2x5_t	vfloat16mf2x6_t	vfloat16mf2x7_t	vfloat16mf2x8_t
vfloat32mf2_t	vfloat32mf2x2_t	vfloat32mf2x3_t	vfloat32mf2x4_t	vfloat32mf2x5_t	vfloat32mf2x6_t	vfloat32mf2x7_t	vfloat32mf2x8_t

Table 7. Tuple types (EMUL=1)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8m1_t	vint8m1x2_t	vint8m1x3_t	vint8m1x4_t	vint8m1x5_t	vint8m1x6_t	vint8m1x7_t	vint8m1x8_t
vuint8m1_t	vuint8m1x2_t	vuint8m1x3_t	vuint8m1x4_t	vuint8m1x5_t	vuint8m1x6_t	vuint8m1x7_t	vuint8m1x8_t
vint16m1_t	vint16m1x2_t	vint16m1x3_t	vint16m1x4_t	vint16m1x5_t	vint16m1x6_t	vint16m1x7_t	vint16m1x8_t
vuint16m1_t	vuint16m1x2_t	vuint16m1x3_t	vuint16m1x4_t	vuint16m1x5_t	vuint16m1x6_t	vuint16m1x7_t	vuint16m1x8_t
vint32m1_t	vint32m1x2_t	vint32m1x3_t	vint32m1x4_t	vint32m1x5_t	vint32m1x6_t	vint32m1x7_t	vint32m1x8_t
vuint32m1_t	vuint32m1x2_t	vuint32m1x3_t	vuint32m1x4_t	vuint32m1x5_t	vuint32m1x6_t	vuint32m1x7_t	vuint32m1x8_t
vint64m1_t	vint64m1x2_t	vint64m1x3_t	vint64m1x4_t	vint64m1x5_t	vint64m1x6_t	vint64m1x7_t	vint64m1x8_t
vuint64m1_t	vuint64m1x2_t	vuint64m1x3_t	vuint64m1x4_t	vuint64m1x5_t	vuint64m1x6_t	vuint64m1x7_t	vuint64m1x8_t
vfloat16m1_t	vfloat16m1x2_t	vfloat16m1x3_t	vfloat16m1x4_t	vfloat16m1x5_t	vfloat16m1x6_t	vfloat16m1x7_t	vfloat16m1x8_t
vfloat32m1_t	vfloat32m1x2_t	vfloat32m1x3_t	vfloat32m1x4_t	vfloat32m1x5_t	vfloat32m1x6_t	vfloat32m1x7_t	vfloat32m1x8_t
vfloat64m1_t	vfloat64m1x2_t	vfloat64m1x3_t	vfloat64m1x4_t	vfloat64m1x5_t	vfloat64m1x6_t	vfloat64m1x7_t	vfloat64m1x8_t

Table 8. Tuple types (EMUL=2)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8m2_t	vint8m2x2_t	vint8m2x3_t	vint8m2x4_t	N/A	N/A	N/A	N/A
vuint8m2_t	vuint8m2x2_t	vuint8m2x3_t	vuint8m2x4_t	N/A	N/A	N/A	N/A
vint16m2_t	vint16m2x2_t	vint16m2x3_t	vint16m2x4_t	N/A	N/A	N/A	N/A
vuint16m2_t	vuint16m2x2_t	vuint16m2x3_t	vuint16m2x4_t	N/A	N/A	N/A	N/A
vint32m2_t	vint32m2x2_t	vint32m2x3_t	vint32m2x4_t	N/A	N/A	N/A	N/A
vuint32m2_t	vuint32m2x2_t	vuint32m2x3_t	vuint32m2x4_t	N/A	N/A	N/A	N/A
vint64m2_t	vint64m2x2_t	vint64m2x3_t	vint64m2x4_t	N/A	N/A	N/A	N/A
vuint64m2_t	vuint64m2x2_t	vuint64m2x3_t	vuint64m2x4_t	N/A	N/A	N/A	N/A
vfloat16m2_t	vfloat16m2x2_t	vfloat16m2x3_t	vfloat16m2x4_t	N/A	N/A	N/A	N/A
vfloat32m2_t	vfloat32m2x2_t	vfloat32m2x3_t	vfloat32m2x4_t	N/A	N/A	N/A	N/A
vfloat64m2_t	vfloat64m2x2_t	vfloat64m2x3_t	vfloat64m2x4_t	N/A	N/A	N/A	N/A

Table 9. Tuple types (EMUL=4)

Non-tuple Types (NFILED=1)	NFIELD=2	NFIELD=3	NFIELD=4	NFIELD=5	NFIELD=6	NFIELD=7	NFIELD=8
vint8m4_t	vint8m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vuint8m4_t	vuint8m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vint16m4_t	vint16m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vuint16m4_t	vuint16m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vint32m4_t	vint32m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vuint32m4_t	vuint32m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vint64m4_t	vint64m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vuint64m4_t	vuint64m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vfloat16m4_t	vfloat16m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vfloat32m4_t	vfloat32m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A
vfloat64m4_t	vfloat64m4x2_t	N/A	N/A	N/A	N/A	N/A	N/A

Pseudo intrinsics

The intrinsics provide additional utility functions to assist users in manipulating across intrinsic types. These functions are called "pseudo intrinsics". These pseudo intrinsics do not represent any real instructions.

`vsetvl`

The vsetvl intrinsics return the number of elements processed in a stripmining loop when provided with the element width and LMUL in the intrinsic suffix.

Note

The implementation must respect the ratio between SEW and LMUL given to the intrinsic. On the other hand, as mentioned in Control of number of elements to be processed, the vsetvl intrinsics do not necessarily map to the emission a vsetvl instruction of that exact SEW and LMUL provided. The actual value written to the vl control register is an implementation defined behavior and typically not known until runtime.

`vsetvlmax`

The vsetvlmax intrinsics return VLMAX ⁵ when provided with the element width and LMUL in the intrinsic suffix.

Note

As mentioned in Control of number of elements to be processed, the vsetvlmax intrinsics do not necessarily map to the emission a vsetvl instruction of that exact SEW and LMUL provided. The actual value written to the vl control register is an implementation defined behavior and typically not known until runtime.

`vreinterpret`

The vreinterpret intrinsics are provided for users to transition across the strongly-typed scheme. The intrinsic is limited to conversion between types operating upon the same number of registers.

The intrinsics do not alter values held within. Please use vfcvt/v(f)wcvt/v(f)ncvt intrinsics if you seek to extend, narrow, or perform real float/interger type conversions for the values.

`vundefined`

The vundefined intrinsics are placeholders to represent unspecified values in variable initialization, or as arguments of vset and vcreate.

`vget`

The vget intrinsics allow users to obtain small LMUL values from larger LMUL ones. The vget intrinsics also allows users to extract non-tuple (NFIELD=1) types from tuple (NFIELD>1) types after segment load intrinsics. The index provided must be a constant known at compile time.

The intrinsics do not map to any real instruction. Whether if the implementation will generate vector move instructions is an optimization issue.

`vset`

The vset intrinsics allow users to combine small LMUL values into larger LMUL ones. The vset intrinsics also allows users to combine non-tuple (NFIELD=1) types to tuple (NFIELD>1) types for segment store intrinsics. The index provided must be a constant known at compile time.

The intrinsics do not map to any real instruction. Whether if the implementation will generate vector move instructions is an optimization issue.

`vlmul_trunc`

The vlmul_trunc intrinsics are syntactic sugars that have the same semantic as vget with idx=0.

`vlmul_ext`

The vlmul_ext intrinsics are syntactic sugars that have the same semantic as vset with idx=0.

`vcreate`

The vcreate intrinsics are syntactic sugars for RVV types creation. They have the same semantic as multiple vset-s filling in values accordingly.

`vlenb`

The vlenb intrinsic returns what is held inside the read-only CSR vlenb ²⁹, which is the vector register length in bytes.

unsigned __riscv_vlenb();

Programming Notes

Agnostic value in the RVV C intrinsics

An agnostic value is in the granularity of "element" contained in an RVV type. Agnostic values are either:

Tail element(s) of an RVV value produced through a ta instruction
Masked-off element(s) of an RVV value produced through a ma instruction
All elements in an uninitialized value
A value assigned with the vundefined intrinsics

An agnostic value is an indeterminate value and evaluation of an agnostic value is undefined behavior. Users should not rely on any evaluation to an agnostic value.

Copying vector register group contents

There is no intrinsic that directly maps to the whole vector register move instructions (vmv<nr>r.v) ³⁰.

For copying of the vector contents in whole, we encourage the users to use the assignment operator (=).

The assignment operator (=) represents the semantic of a whole vector register (group) copy for the expression on the right hand side to the RVV type object on the left hand side. The semantic will still maintain a whole vector register content copy for fractional LMUL types. This enables the compiler to coalesce register usage when possible.

Users may leverage the vector move intrinsics (vmv_v_v) intrinsics if they hope to copy vector register groups with vl != VLMAX.

The passthrough (`vd`) argument in the intrinsics

Intrinsics whose computation is relevant to the value held in destination register (assembly mnemonics vd) have a vd argument in them. The following list enumerates the intrinsics that have a vd argument. Please see the appendix for the exact prototypes of these intrinsics.

Intrinsics with tail-undisturbed (vta=0)
Intrinsics with mask-undisturbed (vma=0)
Intrinsics representing Vector Multiply-Add Operations ¹³
Intrinsics representing Vector Slideup Instructions ²⁴

For intrinsics with no vd argument, the implementation is free to pick any register as the destination register.

Assumption of `vstart=0` for intrinsics users.

The vstart CSR is currently not exposed to the intrinsics programmer, and the intrinsics have the semantics of vstart = 0. Support for positive vstart values is implementation -defined; thus, portable application software should not set vstart > 0.

Assembly generated from the intrinsics

Some users may expect the intrinsics to directly translate and appear in the assembly; however, the intrinsics are the interfaces that expose the vector instruction semantics. The implementation is free to optimize them out if there is an opportunity.

Bookkeeping of configurations

Control of vl, vtype, vxrm, and frm is not directly exposed to the user. The implementation is responsible for setting the correct values into these CSRs to achieve the expected semantics of the intrinsic functions with respect to the conventions defined in the ISA specification ⁰ and ABI specification ⁹.

Strided load/store with stride of 0

The specification mentions ¹⁵ that the strided load/store instruction with a stride of 0 could have different behaviors, performing all memory accesses or fewer memory operations. Since needing all memory accesses isn’t likely to be common, the implementation is allowed to generate fewer memory operations with strided load/store intrinsics.

In other words, the compiler does not guarantee generating the instruction for all memory accesses in strided load/store intrinsics with a stride of 0. If the user needs all memory accesses to be performed, they should use an indexed load/store intrinsics with all zero indices.

Leveraging instructions with operand mnemonics of `vi`

The intrinsics provide variants with operand mnemonics of vv and vx, but not vi. This was an intentional design to reduce the total amount of out-going intrinsics.

It is an optimization issue for the implementation to emit instructions with operand mnemonics of vi when an immediate that can be expressed within 5-bit is provided to the intrinsics.

Mixing inline assembly and intrinsics

The compiler will be conservative to registers (vtype, vxrm, frm) when encountering inline assembly. Users should be aware that mixing uses of intrinsics and inline assembly will result in extra save and restore.

The `new_vl` argument in fault-only-first load intrinsics

The fault-only-first load intrinsics write the new value inside the vl register into the address of the new_vl argument. Providing an illegal memory location is undefined behavior.

References

⁰Github - riscv/riscv-v-spec/v-spec.adoc

Note

Standard extensions are merged into riscv/riscv-isa-manual after ratification. There is an on-going pull request ²⁶ for the "V" extension to be merged. At this moment this intrinsics specification still references the frozen draft ⁰. This reference will be updated in the future once the pull request has been merged.

¹Github - riscv-non-isa/riscv-c-api-doc/riscv-c-api.md

²User Guide for RISC-V Target

³RISC-V Options (Using the GNU Compiler Collection (GCC))

⁴Section 3.4.1 (Vector selected element width vsew[2:0]) in the specification ⁰

⁵Section 3.4.2 (Vector Register Grouping (vlmul[2:0]`)) in the specification ⁰

⁶Section 3.4.3 (Vector Tail Agnostic and Vector Mask Agnostic vta and vma) in the specification ⁰

⁷Section 5.3 (Vector Masking) in the specification ⁰

⁸Section 3.8 (Vector Fixed-Point Rounding Mode Register vxrm) in the specification ⁰

⁹psABI: Vector Register Convention

¹⁰The RISC-V Instruction Set Manual: 8.2 Floating-Point Control and Status Register

¹¹Section 3.5 (Vector Length Register) in the specification ⁰

¹²Section 3.4.2 in the specification ⁰

¹³Section 11.13, 11.14, 13.6, 13.7 in the specification ⁰

¹⁴Section 4.5 (Mask Register Layout) in the specification ⁰

¹⁵Section 7.5 in the specification ⁰

¹⁶Section 7.8 in the specification ⁰

¹⁷Section 5.2 (Vector Operands) in the specification ⁰

¹⁸Section 6 (Configuration-Setting Instructions) in the specification ⁰

¹⁹Section 18 (Standrad Vector Extensions) in the specification ⁰

²⁰Section 18.2 (Zve*: Vector Extensions for Embedded Processors) in the specification ⁰

²¹Section 12 (Vector Fixed-Point Arithmetic Instructions) in the specification ⁰

²²Section 3.9 (3.9. Vector Fixed-Point Saturation Flag vxsat) in the specification ⁰

²³Section 13 (Vector Floating-Point Instructions) in the specification ⁰

²⁴Section 16.3.1 (Vector Slideup Instructions) in the specification ⁰

²⁵Section 3.7 (Vector Start Index CSR vstart) in the specification ⁰

²⁶riscv/riscv-isa-manual#1088

²⁷Section 6.3 (Constraints on Setting vl) in the specficiation ⁰

²⁸Section 6.4 (Example of stripmining and changes to SEW) in the specification ⁰

²⁹Section 3.6 (Vector Byte Length vlenb) in the specification ⁰

³⁰Section 16.6 (Whole Vector Register Move) in the specification ⁰

Files

rvv-intrinsic-spec.adoc

Latest commit

History

rvv-intrinsic-spec.adoc

File metadata and controls

Introduction

Test macro

Header file inclusion

Availability

Control of the vector extension programming model

Control of effective element width (EEW) and effective LMUL (EMUL)

Control of number of elements to be processed

Control of vector masking

Control of behavior of destination tail elements and destination inactive masked-off elements

Control of fixed-point rounding mode

Control of floating-point rounding mode

Implicit-frm intrinsics

Explicit-frm intrinsics

Naming scheme

Policy and masked naming scheme

Explicit (Non-overloaded) naming scheme

Exceptions in the explicit (non-overloaded) naming scheme

Scalar move instructions

Reduction instructions

Add-with-carry / Subtract-with-borrow instructions

vreinterpret, vlmul_trunc/vlmul_ext, and vset/vget

Implicit (Overloaded) naming scheme

Exceptions in the implicit (overloaded) naming scheme

Widening instructions

Type-convert instructions

vreinterpret, LMUL truncate/extension, and vset/vget

Un-supported intrinsics for implicit (overloaded) naming scheme

Type system

Integer types

Floating-point types

Mask types

Tuple type

Pseudo intrinsics

vsetvl

vsetvlmax

vreinterpret

vundefined

vget

vset

vlmul_trunc

vlmul_ext

vcreate

vlenb

Programming Notes

Agnostic value in the RVV C intrinsics

Copying vector register group contents

The passthrough (vd) argument in the intrinsics

Assumption of vstart=0 for intrinsics users.

Assembly generated from the intrinsics

Bookkeeping of configurations

Strided load/store with stride of 0

Leveraging instructions with operand mnemonics of vi

Mixing inline assembly and intrinsics

The new_vl argument in fault-only-first load intrinsics

References

Implicit-`frm` intrinsics

Explicit-`frm` intrinsics

`vreinterpret`, `vlmul_trunc`/`vlmul_ext`, and `vset`/`vget`

`vreinterpret`, LMUL truncate/extension, and `vset`/`vget`

`vsetvl`

`vsetvlmax`

`vreinterpret`

`vundefined`

`vget`

`vset`

`vlmul_trunc`

`vlmul_ext`

`vcreate`

`vlenb`

The passthrough (`vd`) argument in the intrinsics

Assumption of `vstart=0` for intrinsics users.

Leveraging instructions with operand mnemonics of `vi`

The `new_vl` argument in fault-only-first load intrinsics