Inline PTX #496
andrii0lomakin
started this conversation in
Ideas/Proposals
Inline PTX
#496
Replies: 1 comment 1 reply
-
Hi @andrii0lomakin , the #487 PR is for internal testing. From my point of view, this API is difficult to use. We use the pre-built API for analysing potential optimisations that we will include in the compiler. We also use it in another project for internal testing as well. Regarding the inlining of the PTX, you can still use it with the pre-built API. But a better way is to extend the JIT compiler to insert tensor operations directly in the Graal IR and generate the corresponding code from it (e.g., to PTX, SPIR-V, etc). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Good day.
Have you ever considered finding it useful to provide the possibility to provide inline PTX code in the same way as that is done in examples here https://github.com/sschaetz/nvidia-opencl-examples/blob/master/OpenCL/src/oclInlinePTX/inlinePTX.cl and here https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html?
I am asking because, for example, support for tensor cores is absent at the moment, and it will likely be possible to squash all possible GPU performance using the given approach while still keeping the kernel's maintainability at a decent level.
P.S. I have seen PR about the addition of precompiled PTX code, but from my point of view, that noticeably decreases the maintainability of kernel code.
Beta Was this translation helpful? Give feedback.
All reactions