Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 635435f #3042

Merged
merged 9 commits into from
Dec 19, 2024
Merged

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Dec 18, 2024

This PR change the Triton base from 1f8966b to 635435f (Dec 18).
Pass rate: 99.83%->99.77%

Please do not squash and merge this PR.

aeng-openai and others added 6 commits December 17, 2024 10:09
Previously the matmul problem checks whether there is a for loop
with a single dot in a function. This doesn't work well for nested
loops used for example in persistent matmul kernels.

The matmul problem check is updated to consider nested for loops
that contain a single tl.dot operation with at least two loads. Then,
the `scheduleGlobalLoadLocalStore` transformation is applied to the
whole function if the whole function is just a matmul problem.
Otherwise it applies to each leaf for loop with limited scope. 

Also now we ensure it captures both the loop body and global loads
that have been peeled out into a loop prologue by the pipeliner.
… (#5407)

This PR:
1. Refactored construction logic in `LinearLayoutConversions.cpp` for
`stmatrix` selection. Note that the heuristic-based approach will be
replaced with LL-driven approach once we have `divideRight` and
`divideLeft`.
2. Updated `SharedLayout` class and added `has_leading_offset`
attribute.
3. Added comprehensive new test cases for MMA and shared layouts.
Fixes #5439

Currently we end up doing `0 * inf = nan`, the fix is to bitcast to int
first where `x * 0 == 0` holds.
This PR also:
- Enables backward rematerialisation and hoisting for LLs
- Adds a fold reshape(cvt) -> reshape when the layouts are structurally
the same
- Removes an assert that was disallowing the use of LLs across
broadcast. When this happens, the LL will not have the same shape as the
tensor. We do this to match the legacy behaviour and avoid the
proliferation of new layouts
- Removes the layout-specific tests from before and instead we create
functional tests that test the axioms for the reshape function. We see
that all the legacy layouts pass these tests.
- Temporarily tested that the legacy path and the new path agree in CI
in
triton-lang/triton@e93638b
There's no reason to disable this one.
@whitneywhtsang whitneywhtsang self-assigned this Dec 18, 2024
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/merge branch 3 times, most recently from 9d40749 to 4db484d Compare December 19, 2024 00:41
… (#5460)

@pawelszczerbuk wrote the code. I just fixed a few things and added a
test :)

This generalizes the loop pipeliner infrastructure a bit to support
loads with different latencies that are pipelined and multibuffered
differently, allowing more fine-grained buffer allocation. The feature
isn't exposed yet, but the PR also adds an attribute to the TMA load op
allowing the user to manually specify the desired latency.

---------

Co-authored-by: Pawel Szczerbuk <[email protected]>
@whitneywhtsang whitneywhtsang marked this pull request as ready for review December 19, 2024 02:53
Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang whitneywhtsang merged commit c280ea5 into main Dec 19, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch December 19, 2024 04:17
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit 80e2abd Merge OpenAI Triton commit 635435f Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants