Merge OpenAI Triton commit `635435f` #3042

whitneywhtsang · 2024-12-18T19:00:42Z

This PR change the Triton base from 1f8966b to 635435f (Dec 18).
Pass rate: 99.83%->99.77%

Please do not squash and merge this PR.

Previously the matmul problem checks whether there is a for loop with a single dot in a function. This doesn't work well for nested loops used for example in persistent matmul kernels. The matmul problem check is updated to consider nested for loops that contain a single tl.dot operation with at least two loads. Then, the `scheduleGlobalLoadLocalStore` transformation is applied to the whole function if the whole function is just a matmul problem. Otherwise it applies to each leaf for loop with limited scope. Also now we ensure it captures both the loop body and global loads that have been peeled out into a loop prologue by the pipeliner.

… (#5407) This PR: 1. Refactored construction logic in `LinearLayoutConversions.cpp` for `stmatrix` selection. Note that the heuristic-based approach will be replaced with LL-driven approach once we have `divideRight` and `divideLeft`. 2. Updated `SharedLayout` class and added `has_leading_offset` attribute. 3. Added comprehensive new test cases for MMA and shared layouts.

Fixes #5439 Currently we end up doing `0 * inf = nan`, the fix is to bitcast to int first where `x * 0 == 0` holds.

This PR also: - Enables backward rematerialisation and hoisting for LLs - Adds a fold reshape(cvt) -> reshape when the layouts are structurally the same - Removes an assert that was disallowing the use of LLs across broadcast. When this happens, the LL will not have the same shape as the tensor. We do this to match the legacy behaviour and avoid the proliferation of new layouts - Removes the layout-specific tests from before and instead we create functional tests that test the axioms for the reshape function. We see that all the legacy layouts pass these tests. - Temporarily tested that the legacy path and the new path agree in CI in triton-lang/triton@e93638b

There's no reason to disable this one.

@pawelszczerbuk

… (#5460) @pawelszczerbuk wrote the code. I just fixed a few things and added a test :) This generalizes the loop pipeliner infrastructure a bit to support loads with different latencies that are pipelined and multibuffered differently, allowing more fine-grained buffer allocation. The feature isn't exposed yet, but the PR also adds an attribute to the TMA load op allowing the user to manually specify the desired latency. --------- Co-authored-by: Pawel Szczerbuk <[email protected]>

Signed-off-by: Whitney Tsang <[email protected]>

aeng-openai and others added 6 commits December 17, 2024 10:09

[STANDARD] Fix inf handling in tl.flip (#5447)

a52c88a

Fixes #5439 Currently we end up doing `0 * inf = nan`, the fix is to bitcast to int first where `x * 0 == 0` holds.

[BACKEND] Remove decomposition of splat -> shared conversion (#5450)

80e2abd

[LAYOUTS] Enable Slice(Dot) LL conversion (#5400)

48468af

There's no reason to disable this one.

whitneywhtsang requested a review from pbchekin December 18, 2024 19:00

whitneywhtsang self-assigned this Dec 18, 2024

pbchekin approved these changes Dec 18, 2024

View reviewed changes

whitneywhtsang force-pushed the whitneywhtsang/merge branch 3 times, most recently from 9d40749 to 4db484d Compare December 19, 2024 00:41

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 4db484d to 8d5a01b Compare December 19, 2024 01:23

chengjunlu approved these changes Dec 19, 2024

View reviewed changes

Merge commit '635435fc2e56b2a30276302d75df87956b541848'

3d78176

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 8d5a01b to 3d78176 Compare December 19, 2024 02:52

whitneywhtsang marked this pull request as ready for review December 19, 2024 02:53

Fix Windows build failure

c280ea5

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang merged commit c280ea5 into main Dec 19, 2024
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch December 19, 2024 04:17

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 80e2abd~~ Merge OpenAI Triton commit 635435f Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `635435f` #3042

Merge OpenAI Triton commit `635435f` #3042

whitneywhtsang commented Dec 18, 2024 •

edited

Loading

Merge OpenAI Triton commit 635435f #3042

Merge OpenAI Triton commit 635435f #3042

Conversation

whitneywhtsang commented Dec 18, 2024 • edited Loading

Merge OpenAI Triton commit `635435f` #3042

Merge OpenAI Triton commit `635435f` #3042

whitneywhtsang commented Dec 18, 2024 •

edited

Loading