-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge OpenAI Triton commit 635435f
#3042
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously the matmul problem checks whether there is a for loop with a single dot in a function. This doesn't work well for nested loops used for example in persistent matmul kernels. The matmul problem check is updated to consider nested for loops that contain a single tl.dot operation with at least two loads. Then, the `scheduleGlobalLoadLocalStore` transformation is applied to the whole function if the whole function is just a matmul problem. Otherwise it applies to each leaf for loop with limited scope. Also now we ensure it captures both the loop body and global loads that have been peeled out into a loop prologue by the pipeliner.
… (#5407) This PR: 1. Refactored construction logic in `LinearLayoutConversions.cpp` for `stmatrix` selection. Note that the heuristic-based approach will be replaced with LL-driven approach once we have `divideRight` and `divideLeft`. 2. Updated `SharedLayout` class and added `has_leading_offset` attribute. 3. Added comprehensive new test cases for MMA and shared layouts.
Fixes #5439 Currently we end up doing `0 * inf = nan`, the fix is to bitcast to int first where `x * 0 == 0` holds.
This PR also: - Enables backward rematerialisation and hoisting for LLs - Adds a fold reshape(cvt) -> reshape when the layouts are structurally the same - Removes an assert that was disallowing the use of LLs across broadcast. When this happens, the LL will not have the same shape as the tensor. We do this to match the legacy behaviour and avoid the proliferation of new layouts - Removes the layout-specific tests from before and instead we create functional tests that test the axioms for the reshape function. We see that all the legacy layouts pass these tests. - Temporarily tested that the legacy path and the new path agree in CI in triton-lang/triton@e93638b
There's no reason to disable this one.
pbchekin
approved these changes
Dec 18, 2024
whitneywhtsang
force-pushed
the
whitneywhtsang/merge
branch
3 times, most recently
from
December 19, 2024 00:41
9d40749
to
4db484d
Compare
… (#5460) @pawelszczerbuk wrote the code. I just fixed a few things and added a test :) This generalizes the loop pipeliner infrastructure a bit to support loads with different latencies that are pipelined and multibuffered differently, allowing more fine-grained buffer allocation. The feature isn't exposed yet, but the PR also adds an attribute to the TMA load op allowing the user to manually specify the desired latency. --------- Co-authored-by: Pawel Szczerbuk <[email protected]>
whitneywhtsang
force-pushed
the
whitneywhtsang/merge
branch
from
December 19, 2024 01:23
4db484d
to
8d5a01b
Compare
chengjunlu
approved these changes
Dec 19, 2024
whitneywhtsang
force-pushed
the
whitneywhtsang/merge
branch
from
December 19, 2024 02:52
8d5a01b
to
3d78176
Compare
Signed-off-by: Whitney Tsang <[email protected]>
whitneywhtsang
changed the title
Merge OpenAI Triton commit
Merge OpenAI Triton commit Dec 20, 2024
80e2abd
635435f
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 1f8966b to 635435f (Dec 18).
Pass rate: 99.83%->99.77%
Please do not squash and merge this PR.