Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpu/blas] sgemv_noTrans kernel and unit tests @open sesame 10/29 21:54 #2770

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

s-debadri
Copy link
Contributor

Changes in this PR:

  • Added sgemv_noTrans kernel.
  • trans information was added to the required functions.
  • Added all possible unit tests for sgemv and sgemv_noTrans.

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar [email protected]

@taos-ci
Copy link

taos-ci commented Oct 24, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2770. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Comment on lines 51 to 53
nntrainer::TensorDim::TensorType t_type_nchw_fp16 = {
nntrainer::Tformat::NCHW, nntrainer::Tdatatype::FP16};

Copy link
Member

@skykongkong8 skykongkong8 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? (this is fp32 TC so it is not needed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I wonder why there is no fp16 TCs as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused variables. Also we are focussing on accuracy as of now. Will add fp16 tests with later PRs.

Added kernel for sgemv_noTrans.
Modified the conditions of blas_kernel_interface.

Signed-off-by: Debadri Samaddar <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM :)
Please check the comments!

Comment on lines +134 to +135
trans ? sgemv_cl(data, mdata, rdata, trans, dim2, dim1, lda)
: sgemv_cl(data, mdata, rdata, trans, dim1, dim2, lda);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion!
Since trans is now added as a parameter, handling dim1 and dim2 in the sgemv_cl function would be possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trans ? sgemv_cl(data, mdata, rdata, trans, dim2, dim1, lda)
: sgemv_cl(data, mdata, rdata, trans, dim1, dim2, lda);
sgemv_cl(data, mdata, rdata, trans, dim2, dim1, lda)

and

// sgemv_cl
size_t dim1_size = transA ? sizeof(float) * dim2 : sizeof(float) * dim1;
size_t dim2_size = transA ? sizeof(float) * dim1 : sizeof(float) * dim2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calls for N == 1 and M == 1 are a bit different. Also the work group size depends on the dim1 and dim2. I think handling it before the function call would be better so that we can have a generalized function.

Copy link
Member

@DonghakPark DonghakPark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it is good.
Thank you for considering test cases as well as FP32 and FP16 cases in your work. 👍

@@ -32,13 +32,58 @@ static void setUpGpuContext() {
ac.initBlasClKernels();
}

TEST(blas_kernels, dotCL_sgemv) {
TEST(blas_kernels, dotCL_sgemv_M_1_1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more Question)
I brought this PR to my local environment for testing.

It seems that even when setting the enable-opencl option to true, cannot reach the corresponding unit test.
Is it correct that the current test file is not included in any of the meson.build files within the unittest folder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when add

if get_option('enable-opencl')
  test_target += [['unittest_blas_kernels_cl', []]]
endif

this to unittest/meson.build

i got this build error

/home/donghak/Desktop/workspace/nntrainer/nntrainer/opencl/opencl_program.cpp:73:10: error: variable length array ‘binaries_size’ is used [-Werror=vla]
   73 |   size_t binaries_size[num_devices];
      |          ^~~~~~~~~~~~~
/home/donghak/Desktop/workspace/nntrainer/nntrainer/opencl/opencl_program.cpp:98:8: error: variable length array ‘kernel_names’ is used [-Werror=vla]
   98 |   char kernel_names[kernel_names_size];
      |        ^~~~~~~~~~~~
/home/donghak/Desktop/workspace/nntrainer/nntrainer/opencl/opencl_program.cpp:113:18: error: variable length array ‘binaries_ptr’ is used [-Werror=vla]
  113 |   unsigned char *binaries_ptr[num_devices];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should be fixed to enable x86 OpenCL compilation (for tizen build as well). This error may not be detected in Android build. One can face this error, for compilation with PC environment doesn't support dynamic allocation like that. I will make an issue for it :
#2776

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest_blas_kernels_cl was written for Android tests. So, it's only present in the Android.mk file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark There was an error in my previous comment. The PR #2727 resolves this issue! But it needs to be rebased. I will update the issue as well. Sorry for confusion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Thank you for your explanations! 👍

@jijoongmoon jijoongmoon changed the title [gpu/blas] sgemv_noTrans kernel and unit tests [gpu/blas] sgemv_noTrans kernel and unit tests @open sesame 10/29 21:54 Oct 29, 2024
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

@jijoongmoon jijoongmoon merged commit 9af6680 into nnstreamer:main Nov 14, 2024
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants