-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Initial version of Transpose function with OpenCL ops @open sesame 10/04 17:03 #2690
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2690. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp. |
cibot: @niket-agarwal, nntrainer/layers/cl_layers/transpose_cl.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process. |
cibot: @niket-agarwal, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2690-202408011410450.51261901855469-7e976e3a350a0c6f8cf362bfe9720c77a786ed93/. |
7e976e3
to
e3010ee
Compare
cibot: @niket-agarwal, nntrainer/layers/cl_layers/transpose_cl.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process. |
cibot: @niket-agarwal, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2690-202408011538080.67232894897461-e3010eec8afa1eeb51dd0358735f6a376cc20bf9/. |
e3010ee
to
c3364f8
Compare
cibot: @niket-agarwal, nntrainer/cl_context.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process. |
4ea3ba8
to
0456110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
2d5a01a
to
0c7523e
Compare
Nice work! But.. As you already know, this transpose layer is not a general transpose layer but rather one specifically tailored for the limited environment of the llama application. As others have already mentioned in their comments, generalization would require using either the tensor::transpose or permute layers rather than the transpose layer within the llama application which only works under specific settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
Whatever we have done here is also a generalized implementation only. Using transpose_i_cl which is generalised internally and can handle all the directions and would work under all settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for the pull request and the hard work!
I think what @baek2sm meant is the generalization of a layer, not the kernel. I agree that transpose_i_cl()
is generalized internally and can handle transpose two axes and this is a good starting point 👍. However, to explain, TransposeLayer was added only for the LLaMA application and is not used for all purposes. (we use Tensor::transpose() and PermuteLyaer instead). It is currently not used in the LLaMA application as well.
In my opinion, all works under nntrainer/tensor/cl_operations are good to go (nice work!). While TransposeLayer
is not used, I think PermuteLayerCl
would be a better option (maybe this could be the next PR I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM! Except for the clang format :) Please check the format.
(+ Additional comment)
In MHA layer, transpose is required.
I agree with that it would be better to implement the transpose feature as a tensor operation. For this time, however, it is okay to merge this PR to enable GPU kernels to run LLM.
28ee6d0
to
91fed77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
91fed77
to
dc09f1f
Compare
…es with OpenCL ops Added naive version of OpenCL implementation for Transpose function using blas Incorporated kernels for ops used. Added unit tests for transpose about different axes. Signed-off-by: Niket Agarwal <[email protected]>
dc09f1f
to
8f846e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
…es with OpenCL ops Added naive version of OpenCL implementation for Transpose function using blas Incorporated kernels for ops used. Added unit tests for transpose about different axes. Signed-off-by: Niket Agarwal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
…xes with OpenCL ops Added naive version of OpenCL implementation for Transpose function using blas Incorporated kernels for ops used. Added unit tests for transpose about different axes. Updating the transpose function with the recent GPU pipeline changes (blas_kernel_strings) Signed-off-by: Niket Agarwal <[email protected]>.
8ea1e02
to
3f2da67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
Added initial version of Transpose function for GPU. This is a basic implementation using naive kernel.
Changes added with this PR:
Incorporated kernels for GPU to transpose about different axes.
Added
unittest_layers_transpose_cl.cpp
to test Transpose function on GPU.Signed-off-by: Niket Agarwal [email protected]