Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize kernel launch using buffer protocol #281

Merged
5 commits merged into from
Mar 12, 2020
Merged

Optimize kernel launch using buffer protocol #281

5 commits merged into from
Mar 12, 2020

Conversation

y1r
Copy link
Collaborator

@y1r y1r commented Mar 6, 2020

From the discussion on #153, I've tried optimizing code blocks 4 and 5 of the OpenCL Kernel launch.

I applied two optimizations:

  • avoid using NumPy for passing Python scalars
  • use faster APIs to get the raw pointer of NumPy arrays.

Benchmark on train_mnist.py

Before applying above optimizations:

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.189677    0.095858              0.942733       0.9701                    5.35684
2           0.0751313   0.0731992             0.977283       0.9773                    8.70019
3           0.0485502   0.0778455             0.984215       0.9758                    11.8439
4           0.0371889   0.074158              0.988282       0.9797                    15.155
5           0.0275217   0.0802973             0.990749       0.9786                    18.5144
6           0.0254644   0.0791466             0.991549       0.9804                    21.9081
7           0.02129     0.0894079             0.993348       0.9792                    25.0723
8           0.0196997   0.0775197             0.993482       0.9826                    28.402
9           0.0179107   0.101829              0.994332       0.9779                    31.7447
10          0.0118162   0.0966264             0.995848       0.979                     35.1355
11          0.0134912   0.102887              0.995516       0.9786                    38.4133
12          0.0146866   0.0936976             0.995449       0.9817                    41.7728
13          0.0117503   0.102542              0.996499       0.9797                    45.1475
14          0.011161    0.0949723             0.996482       0.9819                    48.3411
15          0.0130875   0.0835896             0.996165       0.984                     51.7076
16          0.00625776  0.0792082             0.998299       0.9836                    55.0679
17          0.0110761   0.103523              0.996732       0.9803                    58.4271
18          0.0103205   0.0891463             0.996582       0.9854                    61.6228
19          0.0104299   0.106009              0.997133       0.9826                    64.9892
20          0.00651202  0.112479              0.997999       0.9813                    68.3543

PR

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.189318    0.0884703             0.943917       0.9719                    5.35029
2           0.0756333   0.0964783             0.976766       0.9702                    8.61064
3           0.0487956   0.0941518             0.984249       0.972                     11.7018
4           0.0366654   0.0783642             0.988115       0.9771                    14.979
5           0.0273711   0.087877              0.990798       0.9758                    18.2496
6           0.0246589   0.0750636             0.991782       0.9815                    21.5778
7           0.0200919   0.0778768             0.993265       0.9799                    24.6969
8           0.0188126   0.0756163             0.993982       0.9826                    27.9492
9           0.0154501   0.0962056             0.994649       0.9783                    31.2573
10          0.0145319   0.0860888             0.995249       0.9817                    34.5655
11          0.0102053   0.123673              0.997099       0.9778                    37.8064
12          0.0153585   0.109619              0.995499       0.9782                    40.9742
13          0.0128702   0.0937969             0.995982       0.9797                    44.2379
14          0.00935716  0.0993471             0.996932       0.9824                    47.4989
15          0.0144791   0.114044              0.995432       0.9785                    50.69
16          0.00778664  0.0972427             0.997716       0.9827                    53.8373
17          0.0103079   0.123376              0.996982       0.9788                    57.0987
18          0.0118289   0.11937               0.996699       0.9801                    60.325
19          0.00942417  0.13216               0.997399       0.9767                    63.457
20          0.00802978  0.0992155             0.997816       0.9837                    66.6341

For reference, CuPy version

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.193629    0.110045              0.941634       0.9649                    8.89782
2           0.072203    0.0664665             0.977516       0.9794                    11.5458
3           0.0475952   0.0832649             0.984849       0.9745                    14.1912
4           0.0360019   0.097843              0.987932       0.9742                    16.9374
5           0.0292779   0.0717296             0.990715       0.9809                    19.6708
6           0.0237712   0.0709314             0.992448       0.9814                    22.3976
7           0.0202223   0.0762753             0.993532       0.9792                    25.0598
8           0.0168574   0.0715926             0.994198       0.9828                    27.6588
9           0.0160383   0.0822404             0.994615       0.9804                    30.373
10          0.0128409   0.0957292             0.996132       0.9801                    33.0916
11          0.0156948   0.0864232             0.994965       0.9783                    35.8197
12          0.010121    0.0810239             0.996882       0.9802                    38.4717
13          0.0165002   0.0814259             0.994999       0.9827                    41.0567
14          0.0103919   0.0985753             0.996682       0.9822                    43.8012
15          0.0136461   0.0842656             0.996232       0.981                     46.5425
16          0.00955445  0.0863684             0.997132       0.9835                    49.2613
17          0.010661    0.0943064             0.996816       0.9826                    51.8996
18          0.00876496  0.0858093             0.997183       0.9842                    54.5882
19          0.0053344   0.0931705             0.99865        0.983                     57.3164
20          0.00840199  0.117667              0.997532       0.9798                    60.0858

elif isinstance(a, bool):
a = numpy.bool_(a)

if core.numpy_scalar_type_set():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I fixed this bug?
(L103 is always True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ybsh Is it okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y1r It seems a bug and has not found because there aren't any test cases passing illegal type value.

@nsakabe-fixstars What do you suggest to do? Adding test cases? Upstream CuPy might include such test cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y1r As @LWisteria say, would you add a test / tests to cover these branches here? When you do it, please leave a note (such as NOTE(y1r): ) which leads to this discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsakabe-fixstars Adding tests (and also fixing bug) can be for another issue/PR? Is it should added by us or CuPy upstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two options:

  1. revert bugfix in this PR, add a new PR for bugfix and test, merge it into clpy, merge it into Optimize kernel launch using buffer protocol #281, merge Optimize kernel launch using buffer protocol #281 into clpy
  2. add a new CuPy's PR for a test, merge it into CuPy, cherry-pick its change

I think option 1 is enough for us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test in CuPy for this situation. Therefore, pulling upstream cannot resolve this issue.
As a summary, there are three options:

  1. revert bugfix in this PR, add a new PR for bugfix and test, merge it into clpy, merge it into Optimize kernel launch using buffer protocol #281, merge Optimize kernel launch using buffer protocol #281 into clpy
  2. add test in this PR
  3. add a new CuPy's PR for a test, merge it into CuPy, cherry-pick its change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsakabe-fixstars Which one is better for us?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y1r OK, I see your situation. Please take the option 1.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 9c5e338) passed on vega.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 9c5e338) passed on titanv.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 9c5e338) passed on vega.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 9c5e338) passed on titanv.

@y1r y1r assigned ghost Mar 9, 2020
@y1r y1r requested a review from a user March 9, 2020 04:02
@y1r
Copy link
Collaborator Author

y1r commented Mar 9, 2020

@nsakabe-fixstars レビューお願いします.

@ghost ghost assigned y1r and unassigned ghost Mar 10, 2020
@ghost
Copy link

ghost commented Mar 10, 2020

@y1r バグ修正した箇所についてコメントがあるのでご確認ください。

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 80c149d) passed on vega.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 80c149d) passed on titanv.

@y1r
Copy link
Collaborator Author

y1r commented Mar 11, 2020

@nsakabe-fixstars Please review this PR after #282 merged. I've already merged #282 into this.

@y1r y1r assigned ghost and unassigned y1r Mar 11, 2020
Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 6cd51a2) passed on titanv.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 6cd51a2) passed on vega.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 4d61361) passed on titanv.

Copy link

@jenkins-maekawa jenkins-maekawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test (commit 4d61361) passed on vega.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ghost ghost merged commit 7a24505 into clpy Mar 12, 2020
@ghost ghost deleted the 153-buffer-protocol branch March 12, 2020 02:35
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants