Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall through edge of call instructions #7

Open
bin2415 opened this issue Mar 11, 2020 · 12 comments
Open

Fall through edge of call instructions #7

bin2415 opened this issue Mar 11, 2020 · 12 comments

Comments

@bin2415
Copy link

bin2415 commented Mar 11, 2020

Hi, I am trying to identify non-return function according to the cfg output from bap. This is the cfg plugin that I used. here.

Here is my method:
If a basic block's terminator is a call instruction and it doesn't have fall through edge, I deem that it calls a non-return function. But when I checked the cfg result, it seems that there is no fall through edge of call instruction.

I am curious whether I can output the fall through edge of call instructions?

Thanks!

@ivg
Copy link
Member

ivg commented Mar 11, 2020

It is actually vice versa :) This graph output includes only intraprocedural edges, and interprocedural edges are pruned, so a call that doesn't return won't have any output edges at all. Well, unless it is a conditional jump that doesn't return, e.g., in arm you can do bne abort that will be lifted to the following ir,

when not ZF call @abort with noreturn
goto fallthrough

With that said, there is no need to do any graph analysis if all you want is to collect calls that do not return. You can do this programmatically, or by just greping the output of bap, e.g.,

bap /bin/true --no-ida  --print-bir-attr=address -d | grep -B1 -e 'call @.* with noreturn'

it will output stuff like:

--
.address 0x403933
000031d6: call @memcpy with noreturn
--
.address 0x403951
0000321e: call @sub_403910 with noreturn
--
000027f6: 
0000a7b5: call @sub_4039a0 with noreturn
--
.address 0x403E9B
00000d14: call @fclose with noreturn
--
.address 0x403EEB
00001352: call @fflush with noreturn
--
.address 0x403ECE
00000e4c: call @fflush with noreturn
--
.address 0x403EFA
00000ef8: call @fseeko with noreturn
--
.address 0x403FF4
0000122a: call @__cxa_atexit with noreturn
--
.address 0x403FE1
00001257: call @__cxa_atexit with noreturn

And you might ask a question: "Why is it calling memset assuming that it won't return? It is not an abort or anything like this".

This is a valid question :) This is due to the compiler optimization named tail-call optimization. For example, this is the example code from /bin/true, as disassembled with bap:

4038d0: 53                                        pushq %rbx
4038d1: 48 89 fb                                  movq %rdi, %rbx
4038d4: e8 87 fe ff ff                            callq -0x179
4038d9:
4038d9: 48 89 da                                  movq %rbx, %rdx
4038dc: 31 f6                                     xorl %esi, %esi
4038de: 48 89 c7                                  movq %rax, %rdi
4038e1: 5b                                        popq %rbx
4038e2: e9 69 d8 ff ff                            jmp -0x2797

and the same with IDA Pro

push    rbx
mov     rbx, rdi
call    sub_403760
mov     rdx, rbx        ; n
xor     esi, esi        ; c
mov     rdi, rax        ; s
pop     rbx
jmp     _memset

and here is the IR:

00002e80: 
00002e8a: BinaryAnalysisPlatform/bap#406 := RBX
00002e8d: RSP := RSP - 8
00002e90: mem := mem with [RSP, el]:u64 <- BinaryAnalysisPlatform/bap#406
00002e97: RBX := RDI
00002ea0: RSP := RSP - 8
00002ea3: mem := mem with [RSP, el]:u64 <- 0x4038D9
00002ea6: call @sub_403760 with return %00002ea8

00002ea8: 
00002ead: RDX := RBX
00002eba: RSI := 0
00002ebd: AF := unknown[bits]:u1
00002ec0: ZF := 1
00002ec3: PF := 1
00002ec6: OF := 0
00002ec9: CF := 0
00002ecc: SF := 0
00002ed3: RDI := RAX
00002edb: RBX := mem[RSP, el]:u64
00002ede: RSP := RSP + 8
00002ee5: call @memset with noreturn

and here is how it is decompiled:

void *__fastcall sub_4038D0(size_t n)
{
  void *v1; // rax

  v1 = (void *)sub_403760();
  return memset(v1, 0, n);
}

as you can see this is a call that doesn't have a fallthrough and not all calls are made via the call instructions, but sometimes compilers emit them as jmps.

So depending on what you need this is fine or not :)

@bin2415
Copy link
Author

bin2415 commented Mar 11, 2020

Great! 👍
Thanks for your detailed description! :)

@bin2415 bin2415 closed this as completed Mar 11, 2020
@bin2415
Copy link
Author

bin2415 commented Mar 19, 2020

Hi!
I found that this pass can detect non-return functions in the program. How can I enable this pass in bap? Or is it enabled in bap by default?

@bin2415 bin2415 reopened this Mar 19, 2020
@ivg
Copy link
Member

ivg commented Mar 19, 2020

Which this? ;)

Are you talking about this one?

Then to get it you need to install the toolkit (make && make install should work, but you need the latest version of BAP to build the toolkit, though for this particular pass it is not required). Then you can enabled it with the --passes=with-no-return.

@bin2415
Copy link
Author

bin2415 commented Mar 19, 2020

Sorry, I forget to paste the link. Yes, correct! Thank you!

@bin2415 bin2415 closed this as completed Mar 19, 2020
@bin2415
Copy link
Author

bin2415 commented Apr 14, 2020

Hi, I want to build this with-no-return pass. However, I got this error:

File "with_no_return.ml", line 85, characters 48-54:
Error: Unbound value G.exit
Command exited with code 2.

The version of opam is 2.0.6, bap is 2.0.0. And the build command is make TARGET=with-no-return.

@ivg
Copy link
Member

ivg commented Apr 14, 2020

You need to install the development version of bap for that, e.g., using opam

opam switch create bap-testing 4.09.0
opam remote add bap git+https://github.com/BinaryAnalysisPlatform/opam-repository#testing
opam update
opam install bap
eval $(opam env)

or you if you prefer Docker, then just build it in the bap-toolkit folder with docker build .

@bin2415
Copy link
Author

bin2415 commented Apr 17, 2020

Hi, I ran the with-no-return pass to get non-return function. But I am not clear why bap deems this function(0x8163b50) as non-return:

8163b50: <sub_8163b50>
8163b50:
8163b50: 83 ec 0c                                 subl $0xc, %esp
8163b53: 8b 44 24 10                              movl 0x10(%esp), %eax
8163b57: 85 c0                                    testl %eax, %eax
8163b59: 74 1f                                    je 0x1f => here, a tail call to 0x8163b5b
8163b7a:
8163b7a: 83 ec 04                                 subl $0x4, %esp
8163b7d: 6a 00                                    pushl $0x0
8163b7f: 68 94 6b 1e 08                           pushl $0x81e6b94
8163b84: ff 35 78 98 24 08                        pushl 0x8249878
8163b8a: e8 61 55 ee ff                           calll -0x11aa9f
8163b8f:
8163b8f: 83 c4 04                                 addl $0x4, %esp
8163b92: 6a ff                                    pushl $-0x1
8163b94: e8 07 58 ee ff                           calll -0x11a7f9
8163b99:
8163b99: 83 c4 0c                                 addl $0xc, %esp
8163b9c: 0f 1f 40 00                              nopl (%eax)

8163b5b: <sub_8163b5b>
8163b5b:
8163b5b: c7 40 04 00 00 00 00                     movl $0x0, 0x4(%eax)
8163b62: c7 00 00 00 00 00                        movl $0x0, (%eax)
8163b68: c7 40 0c 00 00 00 00                     movl $0x0, 0xc(%eax)
8163b6f: c7 40 08 00 00 00 00                     movl $0x0, 0x8(%eax)
8163b76: 83 c4 0c                                 addl $0xc, %esp
8163b79: c3                                       retl

Although bap deems that 8164b5b is a function start, but should it follows the edge(0x8163b59 -> 8163b5b) to determine the return status of 8164b5b?

By the way, I conclude the status of 8163b50 as non-return from this statements:

.address 0x81675C1
00274240: mem := mem with [ESP, el]:u32 <- 0x81675C6
.address 0x81675C1
00274243: call @sub_8163b50 with noreturn

81675c1:       e8 8a c5 ff ff          call   8163b50 <IV_setDefaultFields>

And this is not a tail call.


The ir code of related basic blocks is shown as belows:

.address 0x8163B50
004e286b: sub sub_8163b50()
.address 0x8163B50
00022ef0:
.address 0x8163B50
00022efc: #7776 := low:32[ESP]
.address 0x8163B50
00022eff: ESP := low:32[ESP] - 0xC
.address 0x8163B50
00022f02: CF := #7776 < 0xC
.address 0x8163B50
00022f05: OF := high:1[(#7776 ^ 0xC) & (#7776 ^ low:32[ESP])]
.address 0x8163B50
00022f08: AF := 0x10 = (0x10 & (low:32[ESP] ^ #7776 ^ 0xC))
.address 0x8163B50
00022f0b: PF := ~low:1[let $1 = low:32[ESP] >> 4 ^ low:32[ESP] in
let $2 = $1 >> 2 ^ $1 in $2 >> 1 ^ $2]
.address 0x8163B50
00022f0e: SF := high:1[low:32[ESP]]
.address 0x8163B50
00022f11: ZF := 0 = low:32[ESP]
.address 0x8163B53
00022f18: EAX := mem[ESP + 0x10, el]:u32
.address 0x8163B57
00022f25: #7779 := low:32[EAX]
.address 0x8163B57
00022f28: OF := 0
.address 0x8163B57
00022f2b: CF := 0
.address 0x8163B57
00022f2e: AF := unknown[bits]:u1
.address 0x8163B57
00022f31: PF := ~low:1[let $1 = #7779 >> 4 ^ #7779 in
let $2 = $1 >> 2 ^ $1 in $2 >> 1 ^ $2]
.address 0x8163B57
00022f34: SF := high:1[#7779]
.address 0x8163B57
00022f37: ZF := 0 = #7779
.address 0x8163B59
00022f41: when ZF goto %00022f3b
004e286c: call @sub_8163b5b with noreturn

...
.address 0x8163B8F
00022f9d:
.address 0x8163B8F
00022fa9: #7787 := low:32[ESP]
.address 0x8163B8F
00022fac: ESP := low:32[ESP] + 4
.address 0x8163B8F
00022faf: CF := low:32[ESP] < #7787
.address 0x8163B8F
00022fb2: OF := ~high:1[#7787] & (high:1[#7787] | high:1[low:32[ESP]]) & ~(
high:1[#7787] & high:1[low:32[ESP]])
.address 0x8163B8F
00022fb5: AF := 0x10 = (0x10 & (low:32[ESP] ^ #7787 ^ 4))
.address 0x8163B8F
00022fb8: PF := ~low:1[let $1 = low:32[ESP] >> 4 ^ low:32[ESP] in
let $2 = $1 >> 2 ^ $1 in $2 >> 1 ^ $2]
.address 0x8163B8F
00022fbb: SF := high:1[low:32[ESP]]
.address 0x8163B8F
00022fbe: ZF := 0 = low:32[ESP]
.address 0x8163B92
00022fc6: ESP := ESP - 4
.address 0x8163B92
00022fc9: mem := mem with [ESP, el]:u32 <- 0xFFFFFFFF
.address 0x8163B94
00022fd2: ESP := ESP - 4
.address 0x8163B94
00022fd5: mem := mem with [ESP, el]:u32 <- 0x8163B99
.address 0x8163B94
00022fd8: call @exit with noreturn

@ivg
Copy link
Member

ivg commented Apr 18, 2020

@gitoleg, can you please take a look?
@bin2415, can you please upload the binary?

@bin2415
Copy link
Author

bin2415 commented Apr 18, 2020

calculix_base.amd64-m32-ccr-Ofast.zip
The binary with symbol information is attached. I ran bap with striped binary.

Thanks!

@ivg ivg reopened this Apr 20, 2020
@ivg
Copy link
Member

ivg commented Apr 20, 2020

A status update, after a thorough discussion with @gitoleg, we can confirm that this is a bug in the algorithm (we implicitly assumed that if a tail-call doesn't return to the caller it won't return to the caller of the caller, which is not true of course, and we have a good counter-example, thanks for it). We will update the algorithm and push the updated version.

Thanks for tracking it!

@ivg
Copy link
Member

ivg commented Aug 21, 2020

@gitoleg, can you provide a status update on this issue?

@ivg ivg transferred this issue from BinaryAnalysisPlatform/bap Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants