Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 6f5baf6 #2990

Merged
merged 8 commits into from
Dec 11, 2024
Merged

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Dec 11, 2024

This PR change the Triton base from 4d2e9e5 to 6f5baf6 (Dec 10).
Pass rate: 99.82%->99.81% (#2991)

Please do not squash and merge this PR.

ravil-mobile and others added 6 commits December 9, 2024 22:37
This PR refactors the instruction scheduling enums. Now, it is
implemented in the MLIR.
This PR implements a specialized codegen for `tt.gather` when it
satisfies the conditions of being "warp local": it is possible to
compute the output tensor without data movement across warps.
`isWarpLocal` is a new function that checks this condition, and places
additional restrictions to simplify codegen / separate concerns from
`ttg.convert_layout`.

This enables `tt.gather` to generate better code when the layout is
suitable. In a subsequent PR, a special pattern will be added to
generate optimized layouts for `tt.gather` when possible/profitable to
enable the lowering.
### Commits in this PR
1. [Pipeliner] Multi-buffer TMA descriptors
2. Add tests for pipelined descriptor creation
3. Be more conservative about number of TMA buffers to allocate
4. Update golden samples
5. Use correct modulus for tma updates
@lezcano pointed out in another PR that the order is confusing because
typically we list the lane ID, warp ID, and blockID in this order.
The AMD runner persists changes to the file system between jobs, so the
caches need to be manually cleaned up.

Closes #5384
This relands triton-lang/triton#5392
to enable new arch target since backend support has been
added--it doesn't depend on the reverted LLVM upgrade in
triton-lang/triton#5341; basic
necessary enablement is already included in the current llvm
version we're using.
Enable the TRITON_KERNEL_OVERRIDE feature to work on AMD assembly and
binary. Currently, for the backends, it only works on Nvidia `ptx` and
`cubin`.

---------

Co-authored-by: Yuanwei Fang <[email protected]>
@whitneywhtsang whitneywhtsang marked this pull request as ready for review December 11, 2024 03:29
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit f257479 Merge OpenAI Triton commit 6f5baf6 Dec 11, 2024
@whitneywhtsang whitneywhtsang merged commit e302ae6 into main Dec 11, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch December 11, 2024 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants