【DONT MERGE】 test softmax speed #1326

phlrain · 2023-04-03T08:10:33Z

其中
cinn/ir/fuse_block_model_fp16_test.cc
是softmax 在fp16下的测试case，
kernel耗时，86微秒，接近phi kernel的 82 微秒，

但是落后torch的 77.47 微秒

原因是，部分for loop没有进行merge，需要进一步merge，手动merge后，实测性能为 75 微秒，能够追平torch的实现

…ry_to_split_thread

paddle-bot · 2023-04-03T08:10:38Z

Thanks for your contribution!

…ry_to_split_thread

phlrain added 12 commits March 9, 2023 08:25

update

e941e91

update

e2d8c9c

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

70496d6

…ry_to_split_thread

update

5ad2173

update

c9ea207

update

3ed387a

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

2d0dc2a

…ry_to_split_thread

update

828a062

update

2b11cfc

update

bbc0377

update

e3ab1cd

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

7e702db

…ry_to_split_thread

phlrain added 12 commits April 6, 2023 06:22

add bn test

c666b4e

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

88f8fd0

…ry_to_split_thread

update

e3e814b

update

2fc4ec8

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

277627a

…ry_to_split_thread

update

c643692

Merge branch 'develop' of https://github.com/PaddlePaddle/CINN into t…

535aa86

…ry_to_split_thread

update

e2cf675

update

fd692a9

update

de13dd1

add pre layer norm branch

6154f16

update

80e783a

Provide feedback