Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM in Linux Mode #218

Open
standard3 opened this issue Nov 12, 2024 · 5 comments
Open

OOM in Linux Mode #218

standard3 opened this issue Nov 12, 2024 · 5 comments

Comments

@standard3
Copy link

Hello, thanks for the great tool !

I am trying to reproduce a bug in libtiff 4.0.4 with linux mode, but I can't manage to get a proper snapshot of my VM because of an out of memory error.

Target

I downloaded and compiled libtiff with the following commands :

wget https://download.osgeo.org/libtiff/tiff-4.0.4.tar.gz && \
  tar -xzvf tiff-4.0.4.tar.gz && \
  rm tiff-4.0.4.tar.gz && \
  cd tiff-4.0.4 && \
  CC=clang \
    CXX=clang++ \
    CFLAGS='-ggdb -fsanitize=address' \
    CXXFLAGS='-ggdb -fsanitize=address' \
    ./configure --disable-shared --prefix=$PWD/build && \
  make -j $(nproc) && \
  make install

I then created the following GDB QEMU script :

import sys, os

# import fuzzing breakpoint
from gdb_fuzzbkpt import *

target_dir = "libtiff"

# address to break on, found using gdb
# break_address = "snapshot_here"
break_address = "TIFFClientOpen"

# name of the file in which to break
file_name = "tiffinfo"

# create the breakpoint for the executable specified
FuzzBkpt(target_dir, break_address, file_name, sym_path=file_name)

Environment

Tested on main branch version 0.5.5 :

 ➜ git log --name-status HEAD^..HEAD
commit a231e0a26cee29b0abc51466934f8796f89d2892 (HEAD -> main, tag: v0.5.5, origin/main, origin/HEAD)
Author: Axel Souchet <[email protected]>
Date:   Sat May 25 21:26:29 2024 -0700

    Update README.md

M       README.md

I created two scripts to simplify the snapshotting process :

linux_mode/libtiff/snapshot_client.sh :

#!/usr/bin/env bash

set -euo pipefail

QEMU_SNAPSHOT="../qemu_snapshot"
TARGET_VM="$QEMU_SNAPSHOT/target_vm"

TIFF_DIR="tiff-4.0.4"

# Compile tiffinfo
make -C $TIFF_DIR -j "$(nproc)"
make -C $TIFF_DIR install

TIFFINFO="$TIFF_DIR/build/bin/tiffinfo"

# Copy binary to pwd so GDB can read symbols from it
cp "$TIFFINFO" .
TIFFINFO="$PWD/tiffinfo"

# Copy binary and inputs to target_vm
pushd $TARGET_VM || exit
./scp.sh "$TIFFINFO"
popd || exit

# Run WTF client
$QEMU_SNAPSHOT/gdb_client.sh

linux_mode/libtiff/snapshot_server.sh :

#!/usr/bin/env bash

set -euo pipefail

QEMU_SNAPSHOT="../qemu_snapshot"

# Compile WTF
pushd ../../src/build/ || exit
./build-release.sh
popd || exit

# Run WTF server
$QEMU_SNAPSHOT/gdb_server.sh

Taking the snapshot

Launching first our server and running our program, we can successfully see our function getting break'ed on :

root@linux:~# ./tiffinfo -D -j -c -r -s -w logluv-3c-16b.tiff
[ 1618.924402] traps: tiffinfo[246] trap int3 ip:5555556c30b9 sp:7fffffffe940 error                                                                                      
Trace/breakpoint trap

But then repeating the operation with the client launched :

root@linux:~# ./tiffinfo -D -j -c -r -s -w logluv-3c-16b.tiff
[  111.043808] tiffinfo invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|              
[  111.044902] CPU: 0 UID: 0 PID: 222 Comm: tiffinfo Not tainted 6.12.0-rc1 #1                   
[  111.045527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16              
[  111.046235] Call Trace:                                                                       
[  111.046470]  <TASK>                                                                           
[  111.046601]  dump_stack_lvl+0x53/0x70                                                         
[  111.046866]  dump_header+0x4b/0x3a0                                                           
[  111.047157]  ? do_try_to_free_pages+0x2aa/0x460                                               
[  111.047532]  ? ___ratelimit+0xa7/0x110                                                        
[  111.047881]  oom_kill_process+0x2ee/0x4b0                                                     
[  111.048289]  out_of_memory+0xec/0x700                                                         
[  111.048528]  __alloc_pages_noprof+0xdfc/0xfb0                                                 
[  111.048786]  alloc_pages_mpol_noprof+0x47/0xf0                                                
[  111.049047]  vma_alloc_folio_noprof+0x6c/0xc0                                                 
[  111.049301]  __handle_mm_fault+0x75f/0xce0                                                    
[  111.049695]  handle_mm_fault+0xc7/0x1f0                                                       
[  111.050023]  __get_user_pages+0x20f/0x1010                                                    
[  111.050346]  populate_vma_page_range+0x77/0xc0                                                
[  111.050725]  __mm_populate+0xfc/0x190                                                         
[  111.051020]  __do_sys_mlockall+0x199/0x1e0                                                    
[  111.051359]  do_syscall_64+0x9e/0x1a0                                                         
[  111.051590]  entry_SYSCALL_64_after_hwframe+0x77/0x7f                                         
[  111.051883] RIP: 0033:0x5555556c30b4                                                          
[  111.052093] Code: 8b 45 ec 50 53 51 52 55 57 56 41 50 41 51 41 52 41 53 41 54 41              
[  111.053341] RSP: 002b:00007fffffffe890 EFLAGS: 00000202 ORIG_RAX: 00000000000000              
[  111.053926] RAX: ffffffffffffffda RBX: 00005555556e42f0 RCX: 00005555556c30b4                 
[  111.054331] RDX: 0000000000000003 RSI: 00005555557688c0 RDI: 0000000000000003                 
[  111.054945] RBP: 00007fffffffe980 R08: 00005555556e4170 R09: 00005555556e4230                 
[  111.055561] R10: 00005555556e4540 R11: 0000000000000202 R12: 0000000000000000                 
[  111.056095] R13: 00007fffffffec80 R14: 00005555557c3170 R15: 00007ffff7ffd020                 
[  111.056591]  </TASK>                                                                          
[  111.056773] Mem-Info:                                                                         
[  111.056913] active_anon:44 inactive_anon:7900 isolated_anon:0
[  111.056913]  active_file:13 inactive_file:11 isolated_file:0                                  
[  111.056913]  unevictable:489214 dirty:8 writeback:0                                           
[  111.056913]  slab_reclaimable:1172 slab_unreclaimable:3721                                    
[  111.056913]  mapped:743 shmem:86 pagetables:1261                                              
[  111.056913]  sec_pagetables:0 bounce:0                                                        
[  111.056913]  kernel_misc_reclaimable:0                                                        
[  111.056913]  free:1193 free_pcp:583 free_cma:0                                                
[  111.059726] Node 0 active_anon:176kB inactive_anon:31600kB active_file:52kB inac              
[  111.061417] Node 0 DMA free:0kB boost:0kB min:40kB low:52kB high:64kB reserved_h              
[  111.063324] lowmem_reserve[]: 0 1958 0 0                                                      
[  111.063710] Node 0 DMA32 free:4772kB boost:0kB min:5640kB low:7644kB high:9648kB              
[  111.065838] lowmem_reserve[]: 0 0 0 0                                                         
[  111.066119] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB              
[  111.066729] Node 0 DMA32: 80*4kB (UME) 98*8kB (UE) 27*16kB (UME) 15*32kB (UE) 3*              
[  111.067832] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages              
[  111.068520] 837 total pagecache pages                                                         
[  111.068738] 0 pages in swap cache                                                             
[  111.068934] Free swap  = 0kB                                                                  
[  111.069119] Total swap = 0kB                                                                  
[  111.069294] 524158 pages RAM                                                                  
[  111.069549] 0 pages HighMem/MovableOnly                                                       
[  111.069906] 17929 pages reserved                                                              
[  111.070142] Tasks state (memory values in pages):                                             
[  111.070429] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem               
[  111.071062] [     80]     0    80     8234      640      256      383         1               
[  111.071964] [    103]     0   103    10134     2997     2690      307         0               
[  111.072754] [    139]     0   139     1005      325       32      293         0               
[  111.073534] [    160]     0   160     1435      464      192      272         0               
[  111.074374] [    189]     0   189      723      291        0      291         0               
[  111.075200] [    190]     0   190      723      291        0      291         0               
[  111.076269] [    191]     0   191      723      291        0      291         0               
[  111.077214] [    192]     0   192      723      291        0      291         0               
[  111.077934] [    193]     0   193      723      291        0      291         0               
[  111.078545] [    194]     0   194      723      291        0      291         0               
[  111.079183] [    195]     0   195     1194      407       96      311         0
[  111.080069] [    196]     0   196     3857      656      320      336         0               
[  111.080872] [    212]     0   212     1152      417      128      289         0               
[  111.081682] [    222]     0   222 5368723193   489212   488480      732                       
[  111.082382] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_al              
[  111.083392] Out of memory: Killed process 222 (tiffinfo) total-vm:21474892772kB,              
[  111.088055] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc'              
[  111.089002] clocksource:                       'kvm-clock' wd_nsec: 503935302 wd              
[  111.089844] clocksource:                       'tsc' cs_nsec: 699662209 cs_now:               
[  111.090637] clocksource:                       Clocksource 'tsc' skewed 19572690              
[  111.091889] clocksource:                       'kvm-clock' (not 'tsc') is curren              
[  111.092586] tsc: Marking TSC unstable due to clocksource watchdog                             
Killed

I tried increasing the VM allocated memory size but the OOM just takes longer to come.
The client just hangs :

➜ ./snapshot_client.sh
...
~/dev/wtf/linux_mode/qemu_snapshot/target_vm ~/dev/wtf/linux_mode/libtiff
tiffinfo                                          100% 3599KB  69.1MB/s   00:00
~/dev/wtf/linux_mode/libtiff                                 
Reading symbols from ../qemu_snapshot/target_vm/linux/vmlinux...
Remote debugging using localhost:1234                        
native_irq_disable () at ./arch/x86/include/asm/irqflags.h:37
37              asm volatile("cli": : :"memory");            
add symbol table from file "tiffinfo" at                     
        .text_addr = 0x5555555974d0                          
Removing 'regs.json' file if it exists...                    
Hardware assisted breakpoint 1 at 0x5555556c30d0: file tif_open.c, line 93.
Using '/home/abel/dev/wtf/targets/libtiff' as target directory  
mkdir '/home/abel/dev/wtf/targets/libtiff'                   
mkdir '/home/abel/dev/wtf/targets/libtiff/crashes'           
mkdir '/home/abel/dev/wtf/targets/libtiff/inputs'            
mkdir '/home/abel/dev/wtf/targets/libtiff/outputs'           
mkdir '/home/abel/dev/wtf/targets/libtiff/state'             
Continuing.                                                  
In right process? True                                       
Calling mlockall                                             
Saving 67 bytes at 0x5555556c308d

This may be related to my binary, but on my host I don't have this problem at all with the same command. I have no idea how to debug this issue, if you could maybe guide me ? Let me know if I forgot some context/details.

Thanks !

@0vercl0k
Copy link
Owner

First of all, thank you for trying out the tool and for filing a very detailed issue 🥳

Tagging @jasocrow (one of the coauthor of the Linux mode) in case you've seen this before / you get what's going on.

I'll take a look in the next few days - hopefully we can figure out what's going on :)

Cheers

@0vercl0k
Copy link
Owner

In the case where you don't increase the VM memory, the OOM gets triggered before the breakpoint gets hit? Also, this bit of output seems potentially interesting; do you know where @rip is pointing to?

[  111.051883] RIP: 0033:0x5555556c30b4                                                          
[  111.052093] Code: 8b 45 ec 50 53 51 52 55 57 56 41 50 41 51 41 52 41 53 41 54 41              

In the next log, it looks like you do it the breakpoint but it seems to hang after the mlockall which is done by https://github.com/0vercl0k/wtf/blob/main/linux_mode/qemu_snapshot/gdb_fuzzbkpt.py#L436. What happens here is code is injected to call mlockall right before where your breakpoint is set; then when this executes you will land on your breakpoint a second time, at which point the shellcode is removed / the original bytes restored.

It looks like mlockall doesn't complete or something weird is happening 🤔

I guess one thing you can try is to manually add a mlockall call in your C++ target right before your breakpoint is executed and see what happens when you launch it in the VM. Maybe you can even place your breakpoint on a function that will NOT be executed just so that you can check if the syscall also is hanging in that context.

Does this make sense?

Cheers

@standard3
Copy link
Author

Thank you for your time. It seems that the OOM gets triggered before the breakpoint :

➜ ./snapshot_client.sh
...
Hardware assisted breakpoint 1 at 0x5555556c30d0: file tif_open.c, line 93.
...

So our breakpoint is at 0x5555556c30d0 and our crashed RIP at 0x5555556c30b4.
0x5555556c30b4 – 0x5555556c30d0 = -0x1C = -28

Looking at the proc mapping and the symbol mapping

(gdb) info proc mappings
process 219
Mapped address spaces:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x555555554000     0x555555597000    0x43000        0x0  r--p   /root/tiffinfo <---- In this segment
      0x555555597000     0x555555752000   0x1bb000    0x43000  r-xp   /root/tiffinfo
      0x555555752000     0x5555557c3000    0x71000   0x1fe000  r--p   /root/tiffinfo
      0x5555557c3000     0x5555557dd000    0x1a000   0x26f000  rw-p   /root/tiffinfo
      0x5555557dd000     0x555556132000   0x955000        0x0  rw-p
      0x7ffff7fc5000     0x7ffff7fc9000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc9000     0x7ffff7fcb000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fcb000     0x7ffff7fcc000     0x1000        0x0  r--p   /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7fcc000     0x7ffff7ff1000    0x25000     0x1000  r-xp   /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x26000  r--p   /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x30000  rw-p   /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
      0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

(gdb) info symbol 0x5555556c30b4
_TIFFgetMode + 356 in section .text of /root/tiffinfo

This is called in the TIFFClientOpen function :

TIFF *
TIFFClientOpen(
        const char *name, const char *mode,
        thandle_t clientdata,
        TIFFReadWriteProc readproc,
        TIFFReadWriteProc writeproc,
        TIFFSeekProc seekproc,
        TIFFCloseProc closeproc,
        TIFFSizeProc sizeproc,
        TIFFMapFileProc mapproc,
        TIFFUnmapFileProc unmapproc)
{
        static const char module[] = "TIFFClientOpen";
        TIFF *tif;
        int m;
        const char *cp;

        ...

        m = _TIFFgetMode(mode, module); <----- HERE
        if (m == -1)
                goto bad2;
        tif = (TIFF *)_TIFFmalloc((tmsize_t)(sizeof(TIFF) + strlen(name) + 1));
        if (tif == NULL)
        {
                TIFFErrorExt(clientdata, module, "%s: Out of memory (TIFF structure)", name);
                goto bad2;
        }
        ...

The _TIFFgetMode function looks like the following, and there is no memory allocation

int _TIFFgetMode(const char *mode, const char *module)
{
        int m = -1;

        switch (mode[0])
        {
        case 'r':
                m = O_RDONLY;
                if (mode[1] == '+')
                        m = O_RDWR;
                break;
        case 'w':
        case 'a':
                m = O_RDWR | O_CREAT;
                if (mode[0] == 'w')
                        m |= O_TRUNC;
                break;
        default:
                TIFFErrorExt(0, module, "\"%s\": Bad mode", mode);
                break;
        }
        return (m);
}

Calling mlockall

I added the following to my target and changed the breakpoint to something that will not run.

printf("Calling mlockall\n");
mlockall(3);

With both server and client opened, I ran :

root@linux:~# ./tiffinfo -D -j -c -r -s -w logluv-3c-16b.tiff
Calling mlockall
TIFF Directory at offset 0x10 (16)
...

Nothing happened on both ends. I tried taking the same arguments as in qemu_snapshot/gdb_fuzzbkpt.py.

@0vercl0k
Copy link
Owner

All right - it'll probably easier if I make a repro environment to experiment a bit / see if I run into the same issue. Will follow your instructions.

Cheers

@0vercl0k
Copy link
Owner

Okay I've successfully set-up an environment and I'm able to see what you're seeing - thanks again for the detailed instructions!

Will update this issue when / once I know more.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants