You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In sections 3.2 and 3.3 a little experimental example is portrayed and in order to prevent optimizations of the compiled code the summing variable is declared as volatile. This has the side-effect of forcing it into using a memory-targeted add operation. Whilst it is true that in a case like this where there is a memory latency in the result-chain a 75% prediction rate is sufficient to make branching better if the branch can bypass the memory chained part, if there's no false necessity imposed for the memory chain, branchless suddenly becomes better in all cases. It is also the case that modern compilers (some of them anyway), without the imposed volatile restriction will mostly convert the branching code to branchless, then realise it can vectorise, and produce something 100 times faster than the branching option... ...whether or not the code implies branching or branchless.
I can also falsify the later statement:
We can rewrite branchy code using the ternary operator or various arithmetic tricks, which acts as sort of an implicit contract between programmers and compilers: if the programmer wrote the code this way, then it was probably meant to be branchless.
In my tests both gcc and clang ignore how it was written. gcc only produces branching code (in my experiments - apparently it varies by version, but I didn't find that) and clang only produces branchless, whether written with a ternery ?, if(cond) sum = res; or if(cond) sum+=arr[i];
otherwise, it was a nice little jaunt through branch prediction. I found it looking for any figures available of what the actual penalty for branch misspredict was: this was the only thing that came up...
The text was updated successfully, but these errors were encountered:
In sections 3.2 and 3.3 a little experimental example is portrayed and in order to prevent optimizations of the compiled code the summing variable is declared as volatile. This has the side-effect of forcing it into using a memory-targeted add operation. Whilst it is true that in a case like this where there is a memory latency in the result-chain a 75% prediction rate is sufficient to make branching better if the branch can bypass the memory chained part, if there's no false necessity imposed for the memory chain, branchless suddenly becomes better in all cases. It is also the case that modern compilers (some of them anyway), without the imposed volatile restriction will mostly convert the branching code to branchless, then realise it can vectorise, and produce something 100 times faster than the branching option... ...whether or not the code implies branching or branchless.
I can also falsify the later statement:
In my tests both gcc and clang ignore how it was written. gcc only produces branching code (in my experiments - apparently it varies by version, but I didn't find that) and clang only produces branchless, whether written with a ternery ?,
if(cond) sum = res;
orif(cond) sum+=arr[i];
otherwise, it was a nice little jaunt through branch prediction. I found it looking for any figures available of what the actual penalty for branch misspredict was: this was the only thing that came up...
The text was updated successfully, but these errors were encountered: