Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Initial version of Transpose function with OpenCL ops @open sesame 10/04 17:03 #2690

Closed
wants to merge 3 commits into from

Conversation

niket-agarwal
Copy link
Contributor

@niket-agarwal niket-agarwal commented Aug 1, 2024

Added initial version of Transpose function for GPU. This is a basic implementation using naive kernel.

Changes added with this PR:

Incorporated kernels for GPU to transpose about different axes.
Added unittest_layers_transpose_cl.cpp to test Transpose function on GPU.

Signed-off-by: Niket Agarwal [email protected]

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2690. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, nntrainer/layers/cl_layers/transpose_cl.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2690-202408011410450.51261901855469-7e976e3a350a0c6f8cf362bfe9720c77a786ed93/.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, nntrainer/layers/cl_layers/transpose_cl.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2690-202408011538080.67232894897461-e3010eec8afa1eeb51dd0358735f6a376cc20bf9/.

@taos-ci
Copy link

taos-ci commented Aug 1, 2024

:octocat: cibot: @niket-agarwal, nntrainer/cl_context.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@niket-agarwal niket-agarwal force-pushed the Transpose branch 2 times, most recently from 2d5a01a to 0c7523e Compare August 29, 2024 10:22
@baek2sm
Copy link
Contributor

baek2sm commented Aug 29, 2024

Nice work! But.. As you already know, this transpose layer is not a general transpose layer but rather one specifically tailored for the limited environment of the llama application. As others have already mentioned in their comments, generalization would require using either the tensor::transpose or permute layers rather than the transpose layer within the llama application which only works under specific settings.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@niket-agarwal
Copy link
Contributor Author

Nice work! But.. As you already know, this transpose layer is not a general transpose layer but rather one specifically tailored for the limited environment of the llama application. As others have already mentioned in their comments, generalization would require using either the tensor::transpose or permute layers rather than the transpose layer within the llama application which only works under specific settings.

Whatever we have done here is also a generalized implementation only. Using transpose_i_cl which is generalised internally and can handle all the directions and would work under all settings.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you for the pull request and the hard work!
I think what @baek2sm meant is the generalization of a layer, not the kernel. I agree that transpose_i_cl() is generalized internally and can handle transpose two axes and this is a good starting point 👍. However, to explain, TransposeLayer was added only for the LLaMA application and is not used for all purposes. (we use Tensor::transpose() and PermuteLyaer instead). It is currently not used in the LLaMA application as well.
In my opinion, all works under nntrainer/tensor/cl_operations are good to go (nice work!). While TransposeLayer is not used, I think PermuteLayerCl would be a better option (maybe this could be the next PR I guess).

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM! Except for the clang format :) Please check the format.

(+ Additional comment)
In MHA layer, transpose is required.
I agree with that it would be better to implement the transpose feature as a tensor operation. For this time, however, it is okay to merge this PR to enable GPU kernels to run LLM.

@niket-agarwal niket-agarwal force-pushed the Transpose branch 2 times, most recently from 28ee6d0 to 91fed77 Compare September 12, 2024 09:20
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

…es with OpenCL ops

Added naive version of OpenCL implementation for Transpose function using blas
Incorporated kernels for ops used.
Added unit tests for transpose about different axes.

Signed-off-by: Niket Agarwal <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@jijoongmoon jijoongmoon changed the title [GPU/OpenCL] Initial version of Transpose function with OpenCL ops [GPU/OpenCL] Initial version of Transpose function with OpenCL ops @Open sesame 09/27 15:18 Sep 27, 2024
@jijoongmoon jijoongmoon changed the title [GPU/OpenCL] Initial version of Transpose function with OpenCL ops @Open sesame 09/27 15:18 [GPU/OpenCL] Initial version of Transpose function with OpenCL ops @open sesame 09/27 15:18 Sep 27, 2024
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@jijoongmoon jijoongmoon changed the title [GPU/OpenCL] Initial version of Transpose function with OpenCL ops @open sesame 09/27 15:18 [GPU/OpenCL] Initial version of Transpose function with OpenCL ops @open sesame 10/04 17:03 Oct 4, 2024
…es with OpenCL ops

Added naive version of OpenCL implementation for Transpose function using blas
Incorporated kernels for ops used.
Added unit tests for transpose about different axes.

Signed-off-by: Niket Agarwal <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

…xes with OpenCL ops

    Added naive version of OpenCL implementation for Transpose function using blas
    Incorporated kernels for ops used.
    Added unit tests for transpose about different axes.
    Updating the transpose function with the recent GPU pipeline changes (blas_kernel_strings)

    Signed-off-by: Niket Agarwal <[email protected]>.
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants