Insert freeze between masked loads and sdiv/srem instructions #2775

alexbaden · 2024-11-20T18:12:02Z

From the code comments:

The Triton masked load pattern can generate instances where the
mask value causes undefined behavior in sdiv/srem instructions. The
language allows this UB as the result of those arithmetic
instructions is never used, and control flow to avoid computation
of these instructions would negatively affect performance. But,
LLVM SimplifyCFG aggressively marks code paths with undefined
behavior as dead. This can result in removal of the mask path and
incorrect results from legal Triton kernels due to masked elements
being used in computation. Run a pass to add a freeze instruction
between masked loads and sdiv/srem to signal to LLVM we consider
the sdiv/srem operands to be well defined.

The strategy here is to basically invalidate the assumptions under which SimplifyCFG can remove UB for sdiv/srem. The rationale is that, unlike C/C++, Triton explicitly allows UB in sdiv/srem instructions (likely because the hardware Triton is targeting allows that). Inserting a freeze instruction both signals that we expect the behavior of sdiv/srem to be well defined and hides the constant 0 in the phi from SimplifyCFG's UB optimizations.

The pass needs to run after every instance of InstCombine because the LLVM optimization that removes UB only occurs if the sdiv/srem are in the same BB as the phi, which can happen after any InstCombine.

Note that the directory structure for this pass is a little different than BreakStructPhiNodesPass because we are already using those directories in third_party for MLIR code. If we want to change that, I can open an issue but let's do it separately from this PR.

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

alexbaden · 2024-11-21T13:19:00Z

@arunjose696 The idea of the algorithm is as follows:

Look through each basic block of the function to find one that starts with a PhiNode.

When we find a basic block that starts with a PhiNode, process that basic block by first checking to see if any of the PhiNode values are null/0 constants.

If no PhiNode values are null/zero, no further action is needed. If we have a null or zero, then we iterate the instructions in the BB to see if any sdiv/srem instructions use that null/zero value. If so, we freeze the output of the PhiNode and replace the operand in the sdiv/srem instruction with that frozen value.

The first loop only looks at the first instruction, but iterating all the instructions and breaking is a relatively easy way to do this (and is done in many other LLVM passes). The second loop has to look at all instructions in the BB.

victor-eds

Can we get a lit test under test/LLVMIR?

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

victor-eds · 2024-11-22T14:51:11Z

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

+  for (Instruction &I : BB) {
+    if (I.getOpcode() == Instruction::SDiv ||
+        I.getOpcode() == Instruction::SRem) {
+      const size_t OpIdx = 1;
+      if (I.getOperand(OpIdx) == PhiNode) {
+        auto *freezePhi = new FreezeInst(
+            PhiNode, PhiNode->getName() + ".frozen", I.getIterator());
+        I.setOperand(OpIdx, freezePhi);
+        Changed = true;
+      }
+    }
+  }


Wouldn't it be better to iterate on PhiNode's uses?

in that case, we don't need to pass in BB, and we can rename the function to processPHINode

We want to stay in the same basic block as the Phi node, so users() is not entirely straightforward - I think iterating basic block instructions and checking for the operand match is clearer.

python/test/regression/test_divide.py

victor-eds · 2024-11-22T14:56:34Z

I'd rather have a lit test than the current test. But I'm open to having both.

alexbaden · 2024-11-22T15:11:13Z

I can work on a lit test, but the regression test is far more important as the concern is keeping the mask false path intact throughout the LLVM optimization pipeline.

whitneywhtsang · 2024-11-22T15:46:16Z

third_party/intel/lib/CMakeLists.txt

Does it make sense to add to PostProcess folder that contains other LLVM passes we have?

I did not think we had any other LLVM passes. Did I miss some? Because this pass operates on LLVMIR and not the MLIR LLVM dialect, it needs to be separate from the MLIR LLVM Dialect passes.

We have some under third_party/intel/lib/Target/LLVMIR/

Right - I tried integrating with those but they seem to run as part of the MLIR -> LLVMIR lowering and not as part of the LLVMIR optimization pipeline, so the compiler target needs to be different. Let’s make an issue to follow up if we feel strongly and leave the current directory structure as is for this PR.

Sure, we have to address that in this PR. Those passes are pure LLVM passes, should have no relation to MLIR. @etiotto correct?

Yes is correct. They post process the optimized LLVM IR produced by LLVM's opt.

whitneywhtsang · 2024-11-22T15:51:56Z

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

+  for (Instruction &I : BB) {
+    if (I.getOpcode() == Instruction::SDiv ||
+        I.getOpcode() == Instruction::SRem) {
+      const size_t OpIdx = 1;
+      if (I.getOperand(OpIdx) == PhiNode) {
+        auto *freezePhi = new FreezeInst(
+            PhiNode, PhiNode->getName() + ".frozen", I.getIterator());
+        I.setOperand(OpIdx, freezePhi);
+        Changed = true;
+      }
+    }
+  }


in that case, we don't need to pass in BB, and we can rename the function to processPHINode

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

alexbaden · 2024-11-28T19:18:27Z

Lit test added, all comments have been addressed in code or with a reply above.

victor-eds

LGTM

third_party/intel/lib/LLVMIR/CMakeLists.txt

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp

Co-authored-by: Arun Jose <[email protected]>

alexbaden requested review from ienkovich, whitneywhtsang and etiotto November 20, 2024 18:12

arunjose696 reviewed Nov 21, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

arunjose696 reviewed Nov 21, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

arunjose696 reviewed Nov 21, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

arunjose696 reviewed Nov 21, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

alexbaden force-pushed the alex/2585 branch from 98d2ae9 to ae3ba41 Compare November 21, 2024 14:12

alexbaden requested a review from a team November 22, 2024 12:30

victor-eds reviewed Nov 22, 2024

View reviewed changes

whitneywhtsang reviewed Nov 22, 2024

View reviewed changes

alexbaden force-pushed the alex/2585 branch from b64be96 to f026627 Compare November 28, 2024 17:57

victor-eds approved these changes Dec 2, 2024

View reviewed changes

alexbaden requested a review from whitneywhtsang December 2, 2024 13:29

etiotto reviewed Dec 2, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/CMakeLists.txt Show resolved Hide resolved

alexbaden force-pushed the alex/2585 branch from f026627 to 4de0c45 Compare December 2, 2024 20:54

alexbaden mentioned this pull request Dec 2, 2024

Unify Intel LLVM optimizations under lib/Target/LLVMIR #2900

Merged

whitneywhtsang reviewed Dec 2, 2024

View reviewed changes

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

third_party/intel/lib/LLVMIR/LLVMIRFreezeMaskedDivRem.cpp Outdated Show resolved Hide resolved

alexbaden added 8 commits December 3, 2024 01:20

Prevent UB in div/rem instructions during optimization

7f50dcf

Add regression test 1/?

1dd55a4

Parametrize test_divide (2/?)

c728635

fixup format in test_divide

e36f345

LLVM freeze instruction between mask and div 1/?

e0310dd

LLVM freeze instruction between mask and div 2/?

2243b25

LLVM freeze instruction between mask and div 3/?

6c6a0f0

LLVM freeze instruction between mask and div 4/?

3ef26d0

alexbaden and others added 10 commits December 3, 2024 01:20

LLVM freeze instruction between mask and div 5/5

a35b0ed

fixup

302fd39

Remove unused variable

9f814ee

Co-authored-by: Arun Jose <[email protected]>

rename processPhiNode -> processBasicBlock

e9723dc

simplify phi node incoming values constant check expression

928afd1

cleanup formatting in division test

3349109

add lit test

f1a6029

support multiple phis and undef

7ff052b

remove unused libs

ed6df23

address review comments

dc9c16e

alexbaden force-pushed the alex/2585 branch from a885909 to dc9c16e Compare December 3, 2024 01:37

alexbaden requested a review from whitneywhtsang December 3, 2024 01:39

whitneywhtsang approved these changes Dec 3, 2024

View reviewed changes

alexbaden merged commit 78c13a5 into main Dec 3, 2024
5 checks passed

alexbaden deleted the alex/2585 branch December 3, 2024 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert freeze between masked loads and sdiv/srem instructions #2775

Insert freeze between masked loads and sdiv/srem instructions #2775

alexbaden commented Nov 20, 2024

alexbaden commented Nov 21, 2024

victor-eds left a comment

victor-eds Nov 22, 2024

whitneywhtsang Nov 22, 2024

alexbaden Nov 28, 2024

victor-eds commented Nov 22, 2024

alexbaden commented Nov 22, 2024

whitneywhtsang Nov 22, 2024

alexbaden Nov 22, 2024

whitneywhtsang Nov 22, 2024

alexbaden Nov 22, 2024

whitneywhtsang Nov 22, 2024

etiotto Dec 2, 2024

whitneywhtsang Nov 22, 2024

alexbaden commented Nov 28, 2024

victor-eds left a comment

Insert freeze between masked loads and sdiv/srem instructions #2775

Insert freeze between masked loads and sdiv/srem instructions #2775

Conversation

alexbaden commented Nov 20, 2024

alexbaden commented Nov 21, 2024

victor-eds left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

victor-eds commented Nov 22, 2024

alexbaden commented Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexbaden commented Nov 28, 2024

victor-eds left a comment

Choose a reason for hiding this comment