`symbolic`: Modeling the stack #430

langston-barrett · 2024-08-27T19:58:54Z

In a downstream project, we're using macaw-symbolic to symbolically execute binaries. macaw-symbolic encodes functions in the binary as Crucible CFGs that take and return a struct that represents the values of all the registers. To execute such CFGs, we must set initial values for all registers. Most registers can be set to zeroes or symbolic values and plenty of code will run just fine. However, machine code has a lot of expectations about one particular register: the stack pointer.

Each ABI¹ specifies a stack layout, and our project attempts to model this layout at a low level of abstraction. In particular, we create a large-ish (1MiB) Crucible-LLVM allocation, and:

On AArch32, the very end is initialized with a number of pointer-sized stack slocks, which are reserved for stack-spilled arguments.²
On PPC, the very end is initialized with (in order):
- A pointer-sized chunk of bytes representing the return address.
- A pointer-sized zero value, representing the PowerPC back chain for the entrypoint function.
- A number of pointer-sized stack slocks, which are reserved for stack-spilled arguments.
On x86_64, the very end is initialized with (in order):
- A pointer-sized chunk of bytes representing the return address.
- A number of pointer-sized stack slocks, which are reserved for stack-spilled arguments.

The contents of the stack pointer register are then initialized to be a pointer to this allocation, offset to where the above data begin.

I'd wager that since a lot of code uses the stack, most clients of macaw-symbolic want to model the stack somehow. Furthermore, since macaw-symbolic fairly directly translates manipulations of the stack pointer register in binaries into the Crucible CFGs, I'd argue that this is likely the only strategy that works "out of the box". Therefore, I think we should upstream this code (or something like it) into macaw-*-symbolic.

One notable upside of this kind of model of the stack is that we can examine what happens in the case of certain kinds of common stack-based exploits, such as stack buffer overruns that clobber adjacent variables, as in this C program:

#include "stddef.h"
#include "stdio.h"

__attribute__((noinline)) void zero(char *buf, size_t len) {
  for (int i = 0; i < len; i++) {
    buf[i] = 0;
  }
}

__attribute__((noinline)) void print(char val) {
  printf("val = %i\n", val);
}

void caller(void) {
  char val = 1;
  char buf[8] = {1};
  zero(buf, 9);
  print(val);
}

int main() {
  caller();
}

One notable downside is that such exploits are not automatically detected by the Crucible-LLVM memory model, and so care would be required by users of such an API if they intend to perform safety-oriented analysis.

at a first approximation, each (OS, architecture) pair ↩
One difficulty is that it's hard to tell in advance of executing a function how many stack slots should be reserved for spilled arguments. DWARF information can serve as a source of truth here, when it's available. ↩

The text was updated successfully, but these errors were encountered:

sauclovian-g · 2024-08-28T21:41:10Z

Does Crucible's memory model have the ability to segment a chunk of memory? (As in, distinguish regions that are adjacent to each other and addressable from the same base pointer?) Then you could prepare a stack frame that will be resistant to such attacks... given some assumptions about when you should specialize a pointer to a segmented chunk to a pointer to a segment, which may turn out to be not so easy.

(just speculating, feel free to ignore)

langston-barrett · 2024-08-29T18:23:08Z

Does Crucible's memory model have the ability to segment a chunk of memory? (As in, distinguish regions that are adjacent to each other and addressable from the same base pointer?)

I don't believe so.

Another idea, not sure how feasible this one is... We could create an ExecutionFeature that, before entering a new Macaw CFG:

Allocates a fresh Crucible-LLVM allocation of type StackAlloc
Writes the return address and/or other ABI-specific information to the end of the allocation
Puts a pointer to (almost the end of) this fresh allocation into the stack pointer register

That would at least prevent inter-frame memory corruption, though it wouldn't do anything to stop code from overwriting adjacent stack variables nor the return address. Perhaps for the return address, the same execution feature could check that it is unmodified at function returns.

RyanGlScott · 2024-08-29T18:28:17Z

An addendum: I believe that RISC-V also initializes the end of its stack with slots for stack-spilled arguments, just as AArch32 does (although you should make sure that you adjust the offset on the stack pointer s0 depending on whether you are using 32-bit or 64-bit RISC-V).

langston-barrett · 2024-08-29T19:27:43Z

Prior art:

`Data.Macaw.Symbolic.Testing`

Uses a symbolic, SMT-array-backed stack. As an aside, it's kind of odd to me that this stack setup is spread across so many different lines.

macaw/symbolic/src/Data/Macaw/Symbolic/Testing.hs

Lines 501 to 506 in 429df8e

    
           -- Allocate a stack and insert it into memory 
        
           -- 
        
           -- The stack allocation uses the SMT array backed memory model (rather than 
        
           -- the Haskell-level memory model) 
        
           stackArrayStorage <- WI.freshConstant sym (WSym.safeSymbol "stack_array") WI.knownRepr 
        
           stackSize <- WI.bvLit sym WI.knownRepr (BVS.mkBV WI.knownRepr (2 * 1024 * 1024))

macaw/symbolic/src/Data/Macaw/Symbolic/Testing.hs

Lines 530 to 531 in 429df8e

    
           (stackBasePtr, mem1) <- CLM.doMalloc bak CLM.StackAlloc CLM.Mutable "stack_alloc" initMem stackSize CLD.noAlignment 
        
           mem2 <- CLM.doArrayStore bak mem1 stackBasePtr CLD.noAlignment stackArrayStorage stackSize

macaw/symbolic/src/Data/Macaw/Symbolic/Testing.hs

Lines 533 to 538 in 429df8e

    
           -- Make initial registers, including setting up a stack pointer (which points 
        
           -- into the middle of the stack allocation, to allow accesses into the caller 
        
           -- frame if needed (though it generally should not be) 
        
           initialRegs <- MS.macawAssignToCrucM (mkInitialRegVal symArchFns sym) (MS.crucGenRegAssignment symArchFns) 
        
           stackInitialOffset <- WI.bvLit sym WI.knownRepr (BVS.mkBV WI.knownRepr (1024 * 1024)) 
        
           sp <- CLM.ptrAdd sym WI.knownRepr stackBasePtr stackInitialOffset

`pate`

Also seems array-backed

https://github.com/GaloisInc/pate/blob/c43542818be7780264d8a2552bfeba1cffba2902/src/Pate/Verification/InlineCallee.hs#L135-L143

`saw-script`

Seems like it's just a fresh, symbolic pointer?

https://github.com/GaloisInc/saw-script/blob/47a7632c973df3fdb2d32268edc92948fc150f2d/src/SAWScript/X86Spec.hs#L446

Potentially with negative consequences?

https://github.com/GaloisInc/saw-script/blob/47a7632c973df3fdb2d32268edc92948fc150f2d/src/SAWScript/X86Spec.hs#L1285-L1288

`ambient-verifier`

Also does an SMT array:

https://github.com/GaloisInc/ambient-verifier/blob/open-source/src/Ambient/Verifier/SymbolicExecution.hs#L1272-L1280

Conclusions

Indeed, virtually every client of Macaw wants to model the stack, and most of them do
Whatever we implement in Macaw should at least have the option to use an SMT array allocation

langston-barrett added the symbolic-execution Issues relating to macaw-symbolic and symbolic execution label Aug 27, 2024

langston-barrett self-assigned this Aug 29, 2024

This was referenced Aug 29, 2024

symbolic: Factor stack setup code into its own module #431

Merged

symbolic: Deduplicate mkInitialRegVal across projects #432

Open

x86-symbolic: Setting up a SysV-compatible stack #433

Merged

symbolic: Refactor Testing library #434

Open

langston-barrett mentioned this issue Sep 20, 2024

aarch32-symbolic: Setting up an ABI-compatible stack #439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`symbolic`: Modeling the stack #430

`symbolic`: Modeling the stack #430

langston-barrett commented Aug 27, 2024 •

edited

Loading

sauclovian-g commented Aug 28, 2024

langston-barrett commented Aug 29, 2024 •

edited

Loading

RyanGlScott commented Aug 29, 2024

langston-barrett commented Aug 29, 2024

symbolic: Modeling the stack #430

symbolic: Modeling the stack #430

Comments

langston-barrett commented Aug 27, 2024 • edited Loading

Footnotes

sauclovian-g commented Aug 28, 2024

langston-barrett commented Aug 29, 2024 • edited Loading

RyanGlScott commented Aug 29, 2024

langston-barrett commented Aug 29, 2024

Data.Macaw.Symbolic.Testing

pate

saw-script

ambient-verifier

Conclusions

`symbolic`: Modeling the stack #430

`symbolic`: Modeling the stack #430

langston-barrett commented Aug 27, 2024 •

edited

Loading

langston-barrett commented Aug 29, 2024 •

edited

Loading

`Data.Macaw.Symbolic.Testing`

`pate`

`saw-script`

`ambient-verifier`