-
Notifications
You must be signed in to change notification settings - Fork 178
Remove duplicate conditional branches after ICF #186
Comments
Hi Marius, thanks for bringing up this opportunity! However, BOLT currently lacks the VN and DF framework to apply this optimization in this manner. However, implementing these is in my personal plans for after we upstream BOLT to LLVM, so I'll keep that in mind. Stay tuned :) |
I agree with Amir. Good catch! Thanks for reporting. Your code is pretty good. I have a few comments on the code itself:
I don't think we have that ready, so you would need to implement a new method in X86MCPlusBuilder.cpp that walks backwards starting at the conditional jump until you find the first instruction that defines FLAGS and return that.
Now, here what you want to do is to count dynamic occurrences of this event, instead of static, so you will have a much clear understanding about the impact of this optimization for your workload. Suppose that you found ~200 instances where this optimization would trigger, but then we have little idea if the code is hitting these 200 instances a lot or not. For that, you will want to use an updated profile and then query how hot each basic block is, and tally that: DynamicNumDuplicateCondBranches += BB.getKnownExecutionCount();
I wrote an instruction pattern matching mechanism that you can use to walk though simple, local (bb-only) data dependencies. https://github.com/facebookincubator/BOLT/blob/rebased/bolt/src/MCPlusBuilder.h#L561 But I admit it got more complicated than I intended, and it is probably too awkward to use at this stage, and sometimes it's just easier to write the code your self by looking at regs defs/uses for each instruction in the BB. Take a look at the code for the matcher (walk though it with a debugger) and I think it will be easier to get the idea, but the key here for your specific case is just to query MCInstrDesc::getImplicitDefs() to determine whether your instruction is defining EFLAGS and return that if it is the last instruction that defined EFLAGS before your conditional jump. You need to put that code in X86MCPlusBuilder.cpp because it will depend on X86-specific LLVM enums. |
I have run into a case where our bolt optimized executable contains code like this[1]:
We generate some code which checks where a function pointer is pointing to on an object in order to use a inlined fast version.
ICF inside bolt figures out that
list_length()
,set_len()
,... are all identical withtuplelength()
and merges them.Left over are the now duplicate branches which all checked for the old sym names which are now all the same merged symbol
tuplelength
.I tried prototyping an optimization which tries to optimize this cases but because of my lack of knowledge of BOLT and LLVM MC framework could not get it working :(.
Instead I did try to count how often this branches happen inside our binary (hopefully this code does what I want :-P):
It triggered 248 times even though it only checks the direct predecessor if it has the same jmp condition.
Here is a simple example which when compiled shows the optimization opportunity:
Do you have any plans to add such a optimization?
Could you point me into how to implement it correctly / is there already an optimization doing something similar and just needs to be extended?
Is there a analysis which can find the whole chain of comparisons (right now it only checks the one before which will cause issues if the merged symbol names are not all right after each other)?
[1] pyston/pyston#65
The text was updated successfully, but these errors were encountered: