Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial porting of Op4dTensorGeneric #3404

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

novakovicdj
Copy link
Contributor

CASE ALL SMALL (<8192) MID (8192<= & <1048576) BIG (>=1048576)
Case min max avg min max avg min max avg min max avg
ALL COMBINED 0.5848 1.8965 1.0142 0.6857 1.3725 1.006 0.7349 1.3333 1.0158 0.5848 1.8965 1.0145
NxCxHxW-NxCxHxW 0.8514 1.1985 1.013 0.9444 1.0588 1.0043 0.8514 1.1985 1.0108 0.9794 1.0542 1.0144
NxCxHxW-NxCxHx1 0.7571 1.3585 1.0189 0.9273 1.0784 1.0074 0.8496 1.3333 1.0198 0.7571 1.3585 1.0194
NxCxHxW-NxCx1xW 0.5848 1.8965 1.0219 0.9464 1.0769 1.0107 0.786 1.3304 1.0247 0.5848 1.8965 1.0216
NxCxHxW-NxCx1x1 0.8162 1.2716 1.0242 0.9388 1.0755 1.0053 0.8162 1.2716 1.0235 0.8233 1.2204 1.0266
NxCxHxW-Nx1xHxW 0.7872 1.3429 1.0183 0.9277 1.0667 1.0099 0.7872 1.3304 1.0212 0.8112 1.3429 1.0177
NxCxHxW-Nx1xHx1 0.7108 1.7501 1.0188 0.9322 1.0667 1.0042 0.768 1.2642 1.0196 0.7108 1.7501 1.0201
NxCxHxW-Nx1x1xW 0.6268 1.785 1.0138 0.9355 1.234 1.0066 0.8703 1.2718 1.0171 0.6268 1.785 1.0131
NxCxHxW-Nx1x1x1 0.7349 1.2864 1.0157 0.9231 1.0702 1.005 0.7349 1.2752 1.0166 0.8125 1.2864 1.0168
NxCxHxW-1xCxHxW 0.794 1.3025 1.0146 0.902 1.0909 1.0093 0.9028 1.2393 1.0177 0.794 1.3025 1.0137
NxCxHxW-1xCxHx1 0.7619 1.3661 1.017 0.9375 1.0889 1.0058 0.768 1.3292 1.0196 0.7619 1.3661 1.017
NxCxHxW-1xCx1xW 0.5888 1.7401 1.012 0.9074 1.234 1.0069 0.8257 1.281 1.0159 0.5888 1.7401 1.0108
NxCxHxW-1xCx1x1 0.7458 1.3443 1.0147 0.9216 1.0893 1.0047 0.8691 1.1832 1.0158 0.7458 1.3443 1.0155
NxCxHxW-1x1xHxW 0.7926 1.4048 1.0093 0.9259 1.2128 1.0055 0.8808 1.2599 1.0127 0.7926 1.4048 1.0081
NxCxHxW-1x1xHx1 0.7263 1.4353 1.0096 0.9216 1.0851 1.0038 0.8178 1.3091 1.0113 0.7263 1.4353 1.0096
NxCxHxW-1x1x1xW 0.6173 1.5395 1.0066 0.9231 1.34 1.0077 0.8946 1.1493 1.0099 0.6173 1.5395 1.0048
NxCxHxW-1x1x1x1 0.6857 1.3725 1.0092 0.6857 1.3725 1.0056 0.9686 1.0428 1.0083 0.9671 1.0477 1.0103

This table shows performance comparison between ocl and hip version of Op4dTensorGeneric kernel, it shows min, max and average speed up. It shows performance for all tensor sizes and for tensors divided into three categories:

  • Small, less than 8192 elements
  • Mid, between 8192 and 1048576 elements
  • Big, more than 1048576 elements

Tested on ~760000 test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants