Simple instruction decompiles to incorrect and overly verbose code #7

flarn2006 · 2020-08-04T22:17:51Z

Here's an example:

                             //
                             // ram 
                             // fileOffset=0, length=30
                             // ram: 000000-00001d
                             //
          000000 e6 f4 00 01     mov        r4,0x100
                             LAB_000004                                      XREF[1]:     00000a(j)  
          000004 d7 00 09 00     exts       #0x9,#0x1
          000008 98 84           mov        r8,[r4+]
          00000a 2d fc           jmpr       cc_EQ,LAB_000004
          00000c 26 f4 02 01     sub        r4,#0x102
          000010 5c 44           shl        r4,#0x4
          000012 2b 58           prior      r5,r8
          000014 5c 15           shl        r5,#0x1
          000016 00 45           add        r4,r5
          000018 06 f4 00 10     add        r4,#0x1000
          00001c 9c 04           jmpi       cc_UC,[r4]

The lines

exts #0x9,#0x1
mov r8,[r4+]

should decompile to something like:

short foo, bar;  // r8, r4
foo = *(short* far)(0x90000 + bar);
bar += 2;

Instead, they decompile to:

uVar2 = uVar2 + 2;
uVar2 = *(ushort *)
         ((uint3)((ushort)((uVar2 & 0xc000) == 0) * unaff_DPP0 |
                  (ushort)((uVar2 & 0xc000) == 1) * unaff_DPP1 |
                  (ushort)((uVar2 & 0xc000) == 2) * unaff_DPP2 |
                 (ushort)((uVar2 & 0xc000) == 3) * unaff_DPP3) << 0xe | (uint3)(uVar2 & 0x3fff))
;

First of all, it's incrementing the variable representing r4 before accessing it, but the instruction set manual says [r4+] means to access the address in r4 and then increment it.

Second, it appears to be ignoring the exts line entirely, instead interpreting r4 as a 16-bit address. exts s, n means that the next n instructions should use 32-bit addressing, using s as the upper word for all addresses. Instead, it's using the DPP registers, which define the 16-bit address space.

Less importantly, even if the decompilation were correct, it seems far too verbose. Code using a lot of 16-bit addressing (which is extremely common) becomes very hard to read due to all the repetitive lines of code. Wouldn't it make more sense to use near/far pointer syntax, and just treat all the DPP stuff as implied whenever a near pointer is dereferenced? Or if Ghidra doesn't support that, you could probably define pointers as 24-bit (or 32-bit if it must be a power of two) and define an intrinsic for converting 16-bit addresses, like *__dpp(address) = value;

The text was updated successfully, but these errors were encountered:

esaulenka · 2020-08-05T13:41:39Z

Hello Flarn!

incrementing the variable representing r4 before accessing it

It is very strange thing. Slaspec code written in correct order:

:mov Rwn1215, [Rwm0811+] is op0007=0x98 & Rwm0811 & Rwn1215 {
	local tmp:2;
	read_w(tmp, Rwm0811);
	mov_w(Rwn1215, tmp);
	Rwm0811 = Rwm0811 + 2;
}

it appears to be ignoring the exts line entirely

Yes, you are right, exts extp opcodes not supported now.
Also, I have no ideas, how to do it only with SLEIGH. Looks like there is need special analyzer plugin, written in java. I have no expirence with it...

define an intrinsic for converting 16-bit addresses, like *__dpp(address) = value

It is not good idea, because it breaks all cross-references, even for variables with constant addresses. I think, it should be done in that plugin too.

Right now I dont have enought time to investigate it.
I can just add some README, that describes flaws in current implementation.

flarn2006 · 2020-09-07T21:10:33Z

About the first thing, it's possible that a later decompilation step changes the order, and it only leaves it like that because it knows the order doesn't matter in this case. Like how an optimizing compiler may write instructions in a different order than a literal start-to-finish reading of the source code would indicate, if it knows it won't make a difference. I haven't tested, but maybe it was an instruction where the order mattered (e.g. mov r4, [r4+]) it would generate the correct code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple instruction decompiles to incorrect and overly verbose code #7

Simple instruction decompiles to incorrect and overly verbose code #7

flarn2006 commented Aug 4, 2020 •

edited

Loading

esaulenka commented Aug 5, 2020

flarn2006 commented Sep 7, 2020

Simple instruction decompiles to incorrect and overly verbose code #7

Simple instruction decompiles to incorrect and overly verbose code #7

Comments

flarn2006 commented Aug 4, 2020 • edited Loading

esaulenka commented Aug 5, 2020

flarn2006 commented Sep 7, 2020

flarn2006 commented Aug 4, 2020 •

edited

Loading