Skip to content

How a load instruction makes its way through the LLVM pipeline

Jake Goulding edited this page Jul 9, 2017 · 1 revision

Loads at the IR level

Loading in IR is done through the load instruction. This represents any load operation, with flags specifying address space (for placing in flash memory or RAM memory), the volatileness, atomicity, etc

In the LLVM middle end, transformation passes may, and likely will, insert extra load instructions on top of the ones the file already contains

Loads at the SelectionDAG

The name SelectionDAG is derived from the name of the link library which contains all of the code for building a directed acyclic graph (DAG) for instruction selection. The SelectionDAG itself is an in-memory representation derived from LLVM IR, but it serves a different purpose.

At the SelectionDAG level, everything is represented in SSA form (i.e. every automatic variable is used exactly once). Utilising the fact that in SSA form every local variable is used exactly once, we can "treeify" operations so that we build a nested tree of instructions and their operands

Imagine LLVM IR like the following

%a = i8 5
%b = i8 20
%c = add i8 %a, $b
%d = mul i8 %c, i8 2

Because each variable is used only once, we can expand this entire block into a nested tree of instructions

%d = mul i8 (add i8 i8 5, i8 20), i8 2

Now, the entire point of the SelectionDAG module is to allow pattern-matching on specific DAGs so that we can lower specific sequences to specific target-specific instructions.

These patterns essentially look like this (using an example that would match %d:

(set i8:$d, mul (add i8:$a, i8:$b), i8:$other)

Now, when we write these pattern matchers in TableGen, we have to bind names to the values within so that we can specify where the values should appear in the target-specific instruction.

Here is an example for the AVR target-specific add instruction

def ADDRdRr : FRdRr<0b0000,
                           0b11,
                           (outs GPR8:$rd),
                           (ins GPR8:$src, GPR8:$rr),
                           "add\t$rd, $rr",
                           [(set i8:$rd, (add i8:$src, i8:$rr)),
                            (implicit SREG)]>;

The instruction is derived from the FRdRr format. The first two template parameters are opcode bits, the third argument is the list of output values for the instruction and their types, the fourth argument is the input values and their types, the fifth argument is the mnemonic, and the sixth argument is the SelectionDAG pattern itself.

Notice how the pattern binds all of the values mentioned in the inputs and outputs? That's how we tell LLVM that "this add instruction can be used in place of (set i8:$rd, (add i8:$src, i8:$rr)).

The selection process will convert all instructions into their SelectionDAG form and attempt to use the TableGen instruction definitions to lower every DAG into AVR instructions. If for some reason we are missing a pattern, the DAG will not be lowered fully and an assertion error will be hit.

N.B. Each target can also specify a bunch of custom lowering hooks that manually lower operations which are hard to use patterns with into target instructions (see AVRISelDAGToDAG::Select).

Extra loads inserted pre-emit

There are a bunch of target-specific hooks which run that insert things like function prelude/epilogue code, and stack spilling/unspilling. These can also generate loads, but because they are done after SelectionDAG, we cannot use IR and must instead directly insert MachineInstr values into MachineBasicBlock objects.

In these cases, you will see stuff like

buildMI(AVR::LDI).addReg(AVR::R18).addImm(50);

For these pre-emit loads, the address is very obvious because we directly build the target-specific load instructions.

N.B. The prologue/epilogue insertion code lives in AVRFrameLowering::emitPrologue, etc

SelectionDAG of load immediate instructions

In order to lower load instructions with intermediate values, this pattern is matched

       def LDIRdK : FRdK<0b1110,
                         (outs LD8:$rd),
                         (ins imm_ldi8:$k),
                         "ldi\t$rd, $k",
                         [(set i8:$rd, imm:$k)]>;

imm_ldi8 is defined earlier in the file as an i8, but with some extra stuff so that we insert the R_AVR_LDI_* relocation if a symbol is used, but that stuff isn't relevant for us here.

Basically, the selection process will pick up all matches of (set i8:$rd, imm:$k) and replace it with "ldi\t$rd, $k".

Loading pointers as immediates with multiple patterns

For loads from pointers like this

; Load an uninitialised 32-bit int
%ptr = alloca i32
%val = load i32, i32* %ptr

We actually have to match multiple patterns in order to select.

Initially, the DAG will look something like this

(set i32:$val, (load i32, i32*:$ptr))

Now, we have instructions to load through pointers (ld), but none of the instructions support loading from pointers as immediates - they all require the pointer to be loaded into a register first.

Because of that, we cannot match the load-from-pointer pattern in one step.

Instead, the LLVM instruction selector will partially-match the closest instruction (ld):

       def LDRdPtr : FSTLD<0,
                           0b00,
                           (outs GPR8:$reg),
                           (ins LDSTPtrReg:$ptrreg),
                           "ld\t$reg, $ptrreg",
                           [(set GPR8:$reg, (load i16:$ptrreg))]>,
                     Requires<[HasSRAM]>;

But it cannot match fully because it requires that we're loading from LDSTPtrReg:$ptrreg, which is one of the X, Y, or Z registers defined in AVRRegisterInfo.td.

The selector then checks for patterns it can use in order to fully match, so it basically splits the pattern into two

(set PTRREGS:$reg, i16:$value) // copied almost-verbatim from 'LDIW'
(set GPR8:$reg, (load i16:$ptrreg)) // copied from LD

Thus, the final load instruction is selected from two different patterns, leading to

ldiw $ptrreg, $address
ld $loaded_value, $ptrreg

N.B. ldiw is a pseudo instruction which we expand into two ldi instructions right before emitting assembly