FAQ For Computer Organisation's Assembly
- Registers
- Opcode Table
- Stackframes
- Addressing Modes
- Assembler Directives
- sections
- X86 Calling Convention
- Handy Links
- GDB
- command line arguments
64-bit register | Lower 32 bits | Lower 16 bits | Lower 8 bits |
---|---|---|---|
rax | eax | ax | al |
rbx | ebx | bx | bl |
rcx | ecx | cx | cl |
rdx | edx | dx | dl |
rsi | esi | si | sil |
rdi | edi | di | dil |
rbp | ebp | bp | bpl |
rsp | esp | sp | spl |
r8 | r8d | r8w | r8b |
r9 | r9d | r9w | r9b |
r10 | r10d | r10w | r10b |
r11 | r11d | r11w | r11b |
r12 | r12d | r12w | r12b |
r13 | r13d | r13w | r13b |
r14 | r14d | r14w | r14b |
r15 | r15d | r15w | r15b |
Other important registers:
RIP = instruction pointer, points to the next instruction to be executed. changing this register is the same as a jumps
RFLAGS = register that stores information about the last calculation (flags) to use for conditional jumps
Registers work like this, meaning every row in the above table is actually the same register but parts of it as shown below:
%ah 8 bits |
%al 8 bits |
|||
%ax 16 bits |
||||
%eax 32 bits |
||||
%rax 64 bits |
opcode | operands | function | description |
---|---|---|---|
mov | src,dst | dst = src | copy |
push | dst | (%rsp) = dst, %rsp -= 8 | pushes a value onto the stack |
pop | src | %rsp += 8,src=(%rsp) | pops a value off the stack |
xchg | A,B | A,B = B,A | switches the contents of A and B |
--- | --- | --- | --- |
addq | src,dst | dst = dst + src | adds src to dst |
subq | src,dst | dst = dst - src | subtracts src from dst |
inc | dst | dst = dst + 1 | adds 1 to dst |
dec | dst | dst = dst - 1 | subtracts 1 from dst |
mulq | src | rdx:rax = rax * src | multiplies rax by src (UNSIGNED) |
imulq | src | rdx:rax = rax * src | multiplies rax by src (SIGNED) |
divq | src | rdx:rax = rax / src | divides rax by src (SIGNED) |
idivq | src | rdx:rax = rax / src | divides rax by src (SIGNED) |
--- | --- | --- | --- |
jmp | label | jumps to label (unconditional) | |
je | label | jumps to label (if equal) | |
jne | label | jumps to label (if not equal) | |
jg | label | jumps to label (if greater than) | |
jl | label | jumps to label (if less than) | |
jle | label | jumps to label (if less than or equal) | |
jge | label | jumps to label (if greater than or equal) | |
call | label | push <current adress + 1>, jmp label | calls a function |
ret | jmp (%rsp) | returns to caller | |
loop | label | dec %rcx, jnz label | |
--- | --- | --- | --- |
cmp | A,B | A - B (answer not stored but flags set) | compares 2 numbers. jump instruction follows |
xorq | src,dst | src = src xor dst | bitwise xor |
orq | src,dst | src = src and dst | bitwise and |
andq | src,dst | src = src or dst | bitwise and |
shlq | A,dst | src = src << A | shift left |
shrq | A,dst | src = src >> A | shift right |
not | dst | dst = 1111111- dst | bitwise inversion of dst |
neg | dst | dst = 0 - dst | 2's complement, result of not and add 1 |
leaq | A, dst | dst = &A | load effective adress (& means adress of) |
int | int_no | software interrupt (see linux system calls above, used together with int 0x80) |
Generally, to initialize a stackframe use:
push %rbx #save necessary registers
push %r12
push %r13
push %r14
push %r15
push %rbp #generate stackframe
movq %rsp, %rbp
And to destroy it again use:
movq %rbp, %rsp #restore last stackframe
pop %rbp
pop %r15 #restore necessary registers
pop %r14
pop %r13
pop %r12
pop %rbx
ret
example | name | description |
---|---|---|
movq $label,%rax | immediate (pointer) | loads the location of the label into rax |
movq label,%rax | immediate | loads the quadword at the location of the label into rax |
movq (%rbx),%rax | indirect | loads the quadword at the location pointed to by rbx into rax |
movq 8(%rbx),%rax | indirect offset (positive) | loads the quadword 8 after the location pointed to by rbx into rax |
movq -8(%rbx),%rax | indirect offset (negative) | loads the quadword 8 before the location pointed to by rbx into rax |
movq (%rbx,%rcx),%rax | indirect variable offset | loads the quadword at %rcx after the location pointed to by rbx into rax |
movq (%rbx,%rcx,8),%rax | indirect variable scaled offset (negative) | loads the quadword at %rcx*8 after the location pointed to by rbx into rax |
movq 8(%rbx,%rcx,8),%rax | indirect variable scaled offset (negative) +constant | loads the quadword at 8 after %rcx*8 after the location pointed to by rbx into rax |
Assembler directives are notes for the assembler which tell it how to do the compiling.
directive | explaination |
---|---|
.quad | reserves space for a 64 bit number to be stored |
.long | reserves space for a 32 bit number to be stored |
.word | reserves space for a 16 bit number to be stored |
.byte | reserves space for a 8 bit number to be stored |
.asciz | reserves space for a string of text to be stored, automatically terminated by a 0 (NULL ) |
.ascii | reserves space for a string of text to be stored, not automatically terminated by a 0 (NULL ) |
.skip n | skips n bytes. useful for defining arrays of data. This should normally only be used in the .bss section |
The 4 sections of an assembly program are
using linker scripts (google if you want to know more) more sections can be added. this is done in the gamelib for assignment 7
note that any part of assembly can be in any section. sections are just for optimalization. This means you can put data in text, and text in bss. the only 'restrictive' section is rodata because it can only store read only data. note: using GDB works only if code is in .text
defining a section is easy. just put a . plus the name of the section (like .bss or .text) and then everything after that in the file is part of that section. you can make multiple instances of the same section in different parts of your program (for example two .data sections) and the assembler (gcc) will make sure everything is combined into one.
in .text code is stored. you write your program in this section. make sure you do this for GDB to work.
in .data small variables (integers, text) is stored to be used in your program
in .bss data can also be stored. the difference is that bss data must be uninitialized. this is the case because all of the other sections will actually become a part of the executable file, while the bss section is only a 'promise' for the os. when the program runs the space is created in ram by the os. if you define large arrays of data this should be done in bss to keep the executable small
rodata should be used (and is optimized for) storing constant data. this section can only be read from.
The calling convention (System V AMD64 ABI) that is used on *nix systems is as follows. for 64 bit programs only The first six integer or pointer arguments passed in the registers in this order:
RDI
RSI
RDX
RCX
R8
R9
- (with sometimes
R10
as a static chain pointer in case of nested functions) - Additional arguments are to be passed on to the stack
The return values are stored in RAX
(In case of a 64 bit number) and in RDX:RAX
(MSB:LSB) in case of 128 bit numbers.
Source (x86 Calling Conventions Wikipedia)
An illustration of how C functions are called in respect to the x86_64 SysV calling convention:
GDB is a debugger which can help find segfaults or find other mistakes in your program. to use it compile it using the -g option (put it directly after "gcc") and then instead of running it like ./, you run it as gdb ./. this should launch you into a gdb environment. in this environment you can use the following commands:
- b n (or breakpoint). this sets a breakpoint on line n
- print code. this prints whatever you specify in code. this can be a full c expression, or a register name (e.g. $rdi or $rax)
- x/nx p print n 32 bit words after p. p can be an adress or register. this is useful for reading whats on the stack (e.g. x/10x $rbp)
- n (or next) steps ahead one instruction. when it finds a function call it will not step into instructions inside this function. useful to skip large functions like c stdlib function like printf
- s (or step) steps ahead one instruction. this one does go into large functions
- r (or run) runs the program until the next breakpoint or the end
- c (or continue) after a breakpoint, continue restarts execution like run did until it encounters another breakpoint or the program ends. useful if a breakpoint is in a loop and you want to go to the next iteration
- start starts the program, places a breakpoint on line one so you can imediately start using s and n
when using GDB your program must be compiled with -g and your code must be in a .text section
Getting command line arguments is easy in assembly. basically it works the same as in C. The main function/label is actually called with 2 arguments in rdi and rsi. rdi is the ammount of arguments, and rsi is a pointer to an array of strings which holds the arguments. you know where the array ends with argc/rdi.