An instruction is encoded as a 32-bit word. There are four different formats: A, B, C and D. The format of the instruction is determined by the six most significant bits of the instruction word, and another five bits (bits 2 to 6) are used to distinguish between format A and B.
Format | # Instr. | Operands |
---|---|---|
A | 124 | Reg, Reg, Reg |
B | 256 | Reg, Reg |
C | 47 | Reg, Reg, 14-bit immediate |
D | 15 | Reg, 21-bit immediate |
3 2 1
|1| | | | | | |4| | | | | | | |6| | | | | | | |8| | | | | | | |0|
+-----------+---------+---------+---+---------+---+-------------+
A: |0 0 0 0 0 0|REG1 |REG2 |VM |REG3 |F |OP (7b) |
+-----------+---------+---------+-+-+---------+---+---------+---+
B: |0 0 0 0 0 0|REG1 |REG2 |V|FUNC (6b) |F |1 1 1 1 1|OP |
+-----------+---------+---------+-+-+---------+---+---------+---+
C: |OP (6b) |REG1 |REG2 |V|H|IMM (14b) |
+---+-------+---------+---------+-+-+---------------------------+
D: |1 1|OP (4b)|REG1 |IMM (21b) |
+---+-------+---------+-----------------------------------------+
The fields of the instruction word are interpreted as follows:
Field | Description |
---|---|
OP | Operation |
FUNC | A 6-bit function identifier |
REGn | Register (5 bit identifier) |
IMM | Immediate value |
H | High (1) or low (0) immediate value |
VM | Vector mode (2-bit): 00: scalar <= op(scalar,scalar) 10: vector <= op(vector,scalar) 11: vector <= op(vector,vector) 01: vector <= op(vector,fold(vector)) |
V | Vector mode (1-bit): 0: scalar <= op(scalar[,scalar]) 1: vector <= op(vector[,scalar]) |
F | Flavor (see below) |
The immediate value for format C instructions is interpreted as follows:
- If
H=0
, the value is the 32-bit sign extend version ofIMM
. - If
H=1
, the value isIMM << 18
, with the 18 lowest bits equal to the least significant bit ofIMM
.
The interpretation of the F field depends on the instruction type:
- For load/store instrcutions it is interpreted as an Index scale (a multiplication factor for the 3rd operand).
- For the SEL instrcution it is interpreted as an operand order modifier.
- For all other instructions it is interpreted as a Packed mode descriptor.
F | Packed mode | Index scale | SEL mode |
---|---|---|---|
00 | None (1 x 32 bits) | *1 | c <= (a & c) | (b & ~c) |
01 | Byte (4 x 8 bits) | *2 | c <= (b & c) | (a & ~c) |
10 | Half-word (2 x 16 bits) | *4 | c <= (c & a) | (b & ~a) |
11 | (reserved) | *8 | c <= (b & a) | (c & ~a) |
Note that most C type instructions are immediate operand versions of corresponding A type instructions. For instance the A type instruction ADD reg1, reg2, reg3
has a corresponding C type instruction ADD reg1, reg2, #imm
.
Name | Description |
---|---|
dst | Destination register |
src1 | Source operand 1 |
src2 | Source operand 2 |
src3 | Source operand 3 |
i21 | 21-bit immediate value |
I | Supports immediate operand |
V | Supports vector operation |
P | Supports packed operation |
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
LDB | x | (1) | dst, src1, src2 | dst <= [src1 + src2] (byte) | Load signed byte | |
LDUB | x | (1) | dst, src1, src2 | dst <= [src1 + src2] (byte) | Load unsigned byte | |
LDH | x | (1) | dst, src1, src2 | dst <= [src1 + src2] (halfword) | Load signed halfword | |
LDUH | x | (1) | dst, src1, src2 | dst <= [src1 + src2] (halfword) | Load unsigned halfword | |
LDW | x | (1) | dst, src1, src2 | dst <= [src1 + src2] (word) | Load word | |
LDEA | x | (1) | dst, src1, src2 | dst <= src1 + src2 | Load effective address | |
STB | x | (1) | src1, src2, src3 | [src2 + src3] <= src1 (byte) | Store byte | |
STH | x | (1) | src1, src2, src3 | [src2 + src3] <= src1 (halfword) | Store halfowrd | |
STW | x | (1) | src1, src2, src3 | [src2 + src3] <= src1 (word) | Store word | |
LDLI | x | dst, #i21 | dst <= signextend(i21) | Alt. 1: Load immediate (low 21 bits) | ||
LDHI | x | dst, #i21 | dst <= i21 << 11 | Alt. 2: Load immediate (high 21 bits) | ||
LDHIO | x | dst, #i21 | dst <= (i21 << 11) | 0x7ff | Alt. 3: Load immediate with low ones (high 21 bits) |
See addressing modes for more details.
(1): The third operand in vector loads/stores is used as a stride or offset parameter.
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
J | x | src1, #i21 | pc <= src1+signextend(i21)*4 | Jump to register address Note: src1 can be PC |
||
JL | x | src1, #i21 | lr <= pc+4, pc <= src1+signextend(i21)*4 | Jump to register address and link Note: src1 can be PC |
||
BZ | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 == 0 | Conditionally branch if equal to zero | ||
BNZ | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 != 0 | Conditionally branch if not equal to zero | ||
BS | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 == 0xffffffff | Conditionally branch if set (all bits = 1) | ||
BNS | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 != 0xffffffff | Conditionally branch if not set (at least one bit = 0) | ||
BLT | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 < 0 | Conditionally branch if less than zero | ||
BGE | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 >= 0 | Conditionally branch if greater than or equal to zero | ||
BLE | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 <= 0 | Conditionally branch if less than or equal to zero | ||
BGT | x | src1, #i21 | pc <= pc+signextend(i21)*4 if src1 > 0 | Conditionally branch if greater than zero |
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
ADDPCHI | x | dst, #i21 | dst <= pc + (i21 << 11) | Add high immediate to PC |
Note: ADDPCHI
can be used together with load/store instructions to perform 32-bit PC-relative addressing in just two instructions, or together with jump instructions to perform 32-bit PC-relative jumps in just two instructions.
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
CPUID | x | dst, src1, src2 | dst <= cpuid(src1, src2) | Get CPU information based on src1, src2 (see CPUID) | ||
OR | x | x | dst, src1, src2 | dst <= src1 | src2 | Bitwise or | |
NOR | x | x | dst, src1, src2 | dst <= ~(src1 | src2) | Bitwise nor | |
AND | x | x | dst, src1, src2 | dst <= src1 & src2 | Bitwise and | |
BIC | x | x | dst, src1, src2 | dst <= src1 & ~src2 | Bitwise clear | |
XOR | x | x | dst, src1, src2 | dst <= src1 ^ src2 | Bitwise exclusive or | |
ADD | x | x | x | dst, src1, src2 | dst <= src1 + src2 | Addition |
SUB | x | x | x | dst, src1, src2 | dst <= src1 - src2 | Subtraction (note: src1 can be an immediate value) |
SEQ | x | x | x | dst, src1, src2 | dst <= (src1 == src2) ? 0xffffffff : 0 | Set if equal |
SNE | x | x | x | dst, src1, src2 | dst <= (src1 != src2) ? 0xffffffff : 0 | Set if not equal |
SLT | x | x | x | dst, src1, src2 | dst <= (src1 < src2) ? 0xffffffff : 0 | Set if less than (signed) |
SLTU | x | x | x | dst, src1, src2 | dst <= (src1 < src2) ? 0xffffffff : 0 | Set if less than (unsigned) |
SLE | x | x | x | dst, src1, src2 | dst <= (src1 <= src2) ? 0xffffffff : 0 | Set if less than or equal (signed) |
SLEU | x | x | x | dst, src1, src2 | dst <= (src1 <= src2) ? 0xffffffff : 0 | Set if less than or equal (unsigned) |
MIN | x | x | x | dst, src1, src2 | dst <= min(src1, src2) (signed) | Minimum value |
MAX | x | x | x | dst, src1, src2 | dst <= max(src1, src2) (signed) | Maximum value |
MINU | x | x | x | dst, src1, src2 | dst <= min(src1, src2) (unsigned) | Minimum value |
MAXU | x | x | x | dst, src1, src2 | dst <= max(src1, src2) (unsigned) | Maximum value |
ASR | x | x | x | dst, src1, src2 | dst <= src1 >> src2 (signed) | Arithmetic shift right |
LSL | x | x | x | dst, src1, src2 | dst <= src1 << src2 | Logic shift left |
LSR | x | x | x | dst, src1, src2 | dst <= src1 >> src2 (unsigned) | Logic shift right |
SHUF | x | x | dst, src1, src2 | dst <= shuffle(src1, src2) | Shuffle bytes according to the shuffle descriptor in src2 (see SHUF) | |
SEL | x | x | dst, src1, src2 | dst <= (src1 & dst) | (src2 & ~dst) | Bitwise select (there are three other variants for handling different operand orders) | |
CLZ | x | x | dst, src1 | dst <= clz(src1) | Count leading zeros | |
POPCNT | x | x | dst, src1 | dst <= popcnt(src1) | Count number of bits set to 1 in src1 | |
REV | x | x | dst, src1 | dst <= rev(src1) | Reverse bit order | |
PACK | x | x | dst, src1, src2 | dst <= ((src1 & 0x0000ffff) << 16) | (src2 & 0x0000ffff) |
Pack two half-words into a word. Use PACK.H to pack four bytes into a word. | |
PACKS | x | x | dst, src1, src2 | dst <= (saturate(src1) << 16) | saturate(src2) |
Pack two half-words into a word, with signed saturation. Use PACKS.H to pack four bytes into a word. | |
PACKSU | x | x | dst, src1, src2 | dst <= (saturate(src1) << 16) | saturate(src2) |
Pack two half-words into a word, with unsigned saturation. Use PACKSU.H to pack four bytes into a word. |
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
ADDS | x | x | dst, src1, src2 | dst <= saturate(src1 + src2) | Saturating addition (signed) | |
ADDSU | x | x | dst, src1, src2 | dst <= saturate(src1 + src2) | Saturating addition (unsigned) | |
ADDH | x | x | dst, src1, src2 | dst <= (src1 + src2) / 2 | Halving addition (signed) | |
ADDHU | x | x | dst, src1, src2 | dst <= (src1 + src2) / 2 | Halving addition (unsigned) | |
SUBS | x | x | dst, src1, src2 | dst <= saturate(src1 - src2) | Saturating subtraction (signed) | |
SUBSU | x | x | dst, src1, src2 | dst <= saturate(src1 - src2) | Saturating subtraction (unsigned) | |
SUBH | x | x | dst, src1, src2 | dst <= (src1 - src2) / 2 | Halving subtraction (signed) | |
SUBHU | x | x | dst, src1, src2 | dst <= (src1 - src2) / 2 | Halving subtraction (unsigned) |
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
MULQ | x | x | dst, src1, src2 | dst <= (src1 * src2) >> 31 | Fixed point multiplication (signed Q31 format, or Q15/Q7 for packed operations) | |
MUL | x | x | dst, src1, src2 | dst <= src1 * src2 | Multiplication (signed or unsigned, low 32 bits) | |
MULHI | x | x | dst, src1, src2 | dst <= (src1 * src2) >> 32 | Multiplication (signed, high 32 bits) | |
MULHIU | x | x | dst, src1, src2 | dst <= (src1 * src2) >> 32 | Multiplication (unsigned, high 32 bits) | |
DIV | x | x | dst, src1, src2 | dst <= src1 / src2 | Division (signed, integer part) | |
DIVU | x | x | dst, src1, src2 | dst <= src1 / src2 | Division (unsigned, integer part) | |
REM | x | x | dst, src1, src2 | dst <= src1 % src2 | Remainder (signed) | |
REMU | x | x | dst, src1, src2 | dst <= src1 % src2 | Remainder (unsigned) |
Mnemonic | I | V | P | Operands | Operation | Description |
---|---|---|---|---|---|---|
FMIN | x | x | dst, src1, src2 | dst <= min(src1, src2) | Floating-point minimum value | |
FMAX | x | x | dst, src1, src2 | dst <= max(src1, src2) | Floating-point maximum value | |
FSEQ | x | x | dst, src1, src2 | dst <= (src1 == src2) ? 0xffffffff : 0 | Set if equal (floating-point) | |
FSNE | x | x | dst, src1, src2 | dst <= (src1 != src2) ? 0xffffffff : 0 | Set if not equal (floating-point) | |
FSLT | x | x | dst, src1, src2 | dst <= (src1 < src2) ? 0xffffffff : 0 | Set if less than (floating-point) | |
FSLE | x | x | dst, src1, src2 | dst <= (src1 <= src2) ? 0xffffffff : 0 | Set if less than or equal (floating-point) | |
FSUNORD | x | x | dst, src1, src2 | dst <= (isNaN(src1) || isNaN(src2)) ? 0xffffffff : 0 | Set if unordered (NaN) | |
FSORD | x | x | dst, src1, src2 | dst <= (!isNaN(src1) && !isNaN(src2)) ? 0xffffffff : 0 | Set if ordered | |
ITOF | x | x | dst, src1, src2 | dst <= ((float)src1) * 2^-src2 | Cast signed integer to float with exponent offset | |
UTOF | x | x | dst, src1, src2 | dst <= ((float)src1) * 2^-src2 | Cast unsigned integer to float with exponent offset | |
FTOI | x | x | dst, src1, src2 | dst <= (int)(src1 * 2^src2) | Cast float to signed integer with exponent offset | |
FTOU | x | x | dst, src1, src2 | dst <= (unsigned)(src1 * 2^src2) | Cast float to unsigned integer with exponent offset | |
FTOIR | x | x | dst, src1, src2 | dst <= (int)round(src1 * 2^src2) | Round float to signed integer with exponent offset | |
FTOUR | x | x | dst, src1, src2 | dst <= (unsigned)round(src1 * 2^src2) | Round float to unsigned integer with exponent offset | |
FPACK | x | x | dst, src1, src2 | dst[1] <= (float16)src1 dst[0] <= (float16)src2 |
Reduce precision and pack two floating-point values into a word. Use FPACK.H to pack four floating-point values. | |
FADD | x | x | dst, src1, src2 | dst <= src1 + src2 | Floating-point addition | |
FSUB | x | x | dst, src1, src2 | dst <= src1 - src2 | Floating-point subtraction | |
FMUL | x | x | dst, src1, src2 | dst <= src1 * src2 | Floating-point multiplication | |
FDIV | x | x | dst, src1, src2 | dst <= src1 / src2 | Floating-point division | |
FUNPL | x | x | dst, src1 | dst <= (float)src1[0] | Unpack low half-precision floating-point value. Use FUNPL.H to unpack two quarter-precision floating-point values. | |
FUNPH | x | x | dst, src1 | dst <= (float)src1[1] | Unpack high half-precision floating-point value. Use FUNPH.H to unpack two quarter-precision floating-point values. | |
FSQRT | x | x | dst, src1 | dst <= sqrt(src1) | Floating-point square root |
Most instructions (excluding branch instructions) can be executed in both scalar and vector mode.
For instance the integer instruction ADD
has the following operation modes:
ADD Sd,Sa,Sb
- scalar <= scalar + scalarADD Sd,Sa,#IMM
- scalar <= scalar + scalarADD Vd,Va,Sb
- vector <= vector + scalarADD Vd,Va,#IMM
- vector <= vector + scalarADD Vd,Va,Vb
- vector <= vector + vector
See Vector Design for more information.
Many instructions support packed operation. Suffix the instruction with .B
for packed byte operation (each word is treated as four bytes, and four operations are performed in parallel), or .H
for packed half-word operation (each word is treated as two half-words, and two operations are performed in parallel).
For instance, the following packed operations are possible for the MAX
instruction:
MAX.B Sd,Sa,Sb
- 4x packed byte MAX, scalar <= max(scalar, scalar)MAX.B Vd,Va,Sb
- 4x packed byte MAX, vector <= max(vector, scalar)MAX.B Vd,Va,Vb
- 4x packed byte MAX, vector <= max(vector, vector)MAX.H Sd,Sa,Sb
- 2x packed half-word MAX, scalar <= max(scalar, scalar)MAX.H Vd,Va,Sb
- 2x packed half-word MAX, vector <= max(vector, scalar)MAX.H Vd,Va,Vb
- 2x packed half-word MAX, vector <= max(vector, vector)
Note that immediate operands are not supported for packed operations.
See Packed Operations for more information.
- Move scalar registers to/from vector register elements.
- More floating-point instructions (round, ...?).
- Interrupt / supervisor mode instructions (move to/from user registers, return from interrupt, ...).
- Control instructions/registers (cache control, interrupt masks, status flags, ...).
- Load Linked (ll) and Store Conditional (sc) for atomic operations.