-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompiler Core Improvements #42
Comments
DiscoveryBefore attempting to add additional disassembly passes we need to do a bit of discovery and documentation first. Mainly we want to have a documentation that explains exactly what coverage we expect, how the basic blocks are separated into a control flow graph (CFG), and how the wild carding is applied. Changes to either the coverage, the CFG, or the wildcarding will make all future traits generated incompatible with past traits so it is important that these not be changed often, and that they be well documented. General Questions To Answer
Coverage
Test MethodologyIn addition to understand what we expect in coverage we will also want to generate some test data so we can compare before/after. One possible approach is to add some code that will give us a map (json dictionary) of the identified functions and the basic blocks contained within. This could then be used to first generate a "current coverage" data set that could be compared both against other disassembly tools, as well as our final changes. ResearchThe following are some open source implementations of disassemblers with decent coverage. |
Q: What sections of a file are submitted for disassembly? Does this depend on the execution permissions? Line 63 in 0f84757
Q: Should the disassembler be responsible for detecting non-native code or should this be handled elsewhere? How do we determine if this is VB6? Q: How is wildcarding applied? Can an argument be added to control this? Lines 565 to 583 in 0f84757
Q: Is it possible to specify new disassembly start points? Possibly extend this to allow for PDB data later? Lines 59 to 84 in 0f84757
Q: What sort of coverage is provided for compiled languages that result in non-referenced functions (ie. vtables)? |
Basic blocks are not normalized (they overlap): Simple example: BB at offset 121979 overlaps with BB at offset 120533. We don't need to normalize this but it might be nice to have a flag make it an option as it will be a large performance hit. |
Added the pe and elf entry points to the functions to be processes instead of just pushing the address of the start of the function b86a709 Improved coverage for pe.x86 Function coverage: 1.24% ( Missing 476 from total 482 ) -> Function coverage: 54.98% ( Missing 217 from total 482 ) |
Leaving this one open just changing title to Improvements so we can track |
Having a look into this now, hoping to make some improvements to the linear diassembler pass, reading the paper, maybe important to note the most useful reading is in the implementation section :) |
I created I've performed some improvements where now the method
Still having a decent amount of false positives. The linear pass |
Added the method bool Disassembler::IsInvalidNopInsn(cs_insn *insn){
return IsNopInsn(insn) ||
(IsSemanticNopInsn(insn) && (file_reference.binary_type != BINARY_TYPE_PE)) ||
(IsTrapInsn(insn) && (file_reference.binary_type == BINARY_TYPE_PE));
} We still get false positives with this, so I'm continuing to investigate. |
Well we are here now:
False positives down a lot now, after removing the extra else statement from |
I added a check for void Disassembler::AddDiscoveredBlock(uint64_t address, struct Section *sections, uint index) {
if (IsVisited(sections[index].visited, address) == false &&
address < sections[index].data_size &&
IsFunction(sections[index].functions, address) == false) {
if (sections[index].blocks.insert(address).second == true){
sections[index].visited[address] = DISASSEMBLER_VISITED_QUEUED;
sections[index].discovered.push(address);
}
}
} |
Apply a wildcard of already-known function to others blocks: |
The text was updated successfully, but these errors were encountered: