[CPU] attn supports f16 #26487

xczhai · 2024-09-09T08:33:08Z

Details:

rebase from [CPU] SDPA supports f16 #22939
enable avx512 fp16 for attention
enable amx fp16 for attention
update PagedAttentionExtension lightly. can specify the correct type to pa second output precision

Tickets:

128183

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/common.hpp

src/plugins/intel_cpu/src/nodes/kernels/x64/brgemm_kernel.cpp

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp

src/inference/src/system_conf.cpp

xczhai · 2024-09-27T08:20:50Z

@luo-cheng2021 Have updated the code according comments. Could you help review the codes, especially page attention part? Thanks.

luo-cheng2021 · 2024-09-29T01:02:50Z

src/plugins/intel_cpu/src/graph.cpp

@@ -193,9 +193,12 @@ void Graph::Replicate(const std::shared_ptr<const ov::Model> &model,
        auto parentNode = op2node[unusedOutput.get_node_shared_ptr()];
        const auto port = unusedOutput.get_index();
        const auto nodeName = std::string("stub_") + std::to_string(unusedOutput.get_index()) + "_" + parentNode->getName();
+        // WA: avoid PagedAttention's second output reorder.


We'd better not to hardcode here and should find the place where the output precision is changed to f16.

Any updates here. It has to be resolved before the merge.

Any updates here. It has to be resolved before the merge.

still address it. The root cause is ConvertPrecision transformation. Need to deal with it carefully and avoid affecting gpu.

@dmitry-gorokhov @luo-cheng2021
remove such wa.
The root cause is that the PagedAttentionExtension op is a bit hardcode. doesn't provide a mechanism to change the 2nd output dtype.
I make some changes.

add a set_out_type member func to PagedAttentionExtension op.

When execute validate_and_infer_types in PagedAttentionExtension, it will determine output type. It won't break the GPU path.

add a fuse_type_to_pa in CPU plugin, which is a extend to ConvertPrecision. It is used to specify the correct type for PagedAttentionExtension 's 2nd output type. The scope is in CPU plugin and won't break the common pass.

@xczhai Just for my better understanding: could you please descrive the pattern there Reorder is inserted? Like next op after PA expected fp16 on its input?

@xczhai Just for my better understanding: could you please descrive the pattern there Reorder is inserted? Like next op after PA expected fp16 on its input?

Okay.

At the very beginning, the PA op spec describes the two outputs type is aligned with input0 type. As a result, PA's outputs type is f32 when entering CPU plugin.

During CPU plugin transformation, ConvertPrecision will convert or fuse the op's type. As a result, PA's two outputs type is f16. But remember PA's 2nd output is dangle without any child and Result node.

In construct graph, all the dangle output will be wrapped by Result node and the type is aligned with output type. In this case, the specific pattern is PA's 2nd output --> Result(f16)

But in CPU node design, PA's 2nd output is always f32. So the pattern is PA's 2nd output(f32) --> Result(f16).

The following ResolveConflict logic scan this pattern and then insert a Reorder. So the pattern becomes PA's 2nd output(f32) --> Reorder --> Result(f16)

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

yuxu42 · 2024-09-30T02:28:14Z

Hi @dmitry-gorokhov Could you please take a review? Thanks!

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/common.hpp

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/softmax_kernel.hpp

src/plugins/intel_cpu/src/nodes/kernels/x64/brgemm_kernel.cpp

...ns/intel_cpu/tests/functional/custom/subgraph_tests/src/common/concat_multiple_query_sdp.cpp

dmitry-gorokhov · 2024-10-09T06:18:40Z

@xczhai I am going to merge ARM SDPA FP16 first #26487. Lets rebase the PR after that.

- rebase f16 impl from arm - refactor the testcase for x64

xczhai · 2024-10-12T06:09:35Z

@xczhai I am going to merge ARM SDPA FP16 first #26487. Lets rebase the PR after that.

rebase arm

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp

dmitry-gorokhov · 2024-10-14T05:19:34Z

src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/classes/concat_sdp.cpp

+
+    if ((inType == ElementType::bf16 && !ov::with_cpu_x86_bfloat16()) ||
+        (inType == ElementType::f16 && !ov::with_cpu_x86_avx512_core_fp16())) {
+        GTEST_SKIP();


Test skip is still there

Test skip is still there

remove

xczhai requested review from a team as code owners September 9, 2024 08:33

xczhai marked this pull request as draft September 9, 2024 08:33

github-actions bot added category: inference OpenVINO Runtime library - Inference category: CPU OpenVINO CPU plugin labels Sep 9, 2024

xczhai added the do_not_merge label Sep 9, 2024

xczhai force-pushed the xc/avx_fp16 branch 2 times, most recently from f65a8e1 to cc3d9cf Compare September 10, 2024 10:32

yuxu42 mentioned this pull request Sep 23, 2024

[CPU] SDPA supports f16 #22939

Closed

xczhai force-pushed the xc/avx_fp16 branch 2 times, most recently from 63700c7 to 667901f Compare September 24, 2024 06:08

xczhai requested a review from luo-cheng2021 September 24, 2024 09:38

xczhai removed the do_not_merge label Sep 24, 2024

xczhai marked this pull request as ready for review September 24, 2024 09:41

xczhai force-pushed the xc/avx_fp16 branch from 667901f to afd2295 Compare September 24, 2024 10:13

luo-cheng2021 reviewed Sep 25, 2024

View reviewed changes

github-actions bot removed the category: inference OpenVINO Runtime library - Inference label Sep 27, 2024

xczhai force-pushed the xc/avx_fp16 branch from 63c9416 to 3309e6f Compare September 27, 2024 07:45

luo-cheng2021 reviewed Sep 29, 2024

View reviewed changes

xczhai force-pushed the xc/avx_fp16 branch 2 times, most recently from ed88364 to 37479b2 Compare September 29, 2024 07:01

dmitry-gorokhov reviewed Sep 30, 2024

View reviewed changes

xczhai force-pushed the xc/avx_fp16 branch from 0f72618 to 3a50646 Compare October 8, 2024 07:53

xczhai requested a review from a team as a code owner October 10, 2024 06:33

github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Oct 10, 2024

xczhai force-pushed the xc/avx_fp16 branch from bb1b508 to ec9363e Compare October 10, 2024 06:34

rebase f16 from arm

eed3d2a

- rebase f16 impl from arm - refactor the testcase for x64

xczhai force-pushed the xc/avx_fp16 branch from c10ac25 to eed3d2a Compare October 12, 2024 03:40

xczhai added 3 commits October 12, 2024 11:41

Merge branch 'master' into xc/avx_fp16

93065fe

fix a coding lint error

fa46e0a

remove duplicated func

71e245e

xczhai closed this Oct 12, 2024

xczhai reopened this Oct 12, 2024

filer tests from arm

3115764

xczhai force-pushed the xc/avx_fp16 branch from f960392 to 3115764 Compare October 12, 2024 07:43

Merge branch 'master' into xc/avx_fp16

66bede2

dmitry-gorokhov reviewed Oct 14, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Show resolved Hide resolved

dmitry-gorokhov reviewed Oct 14, 2024

View reviewed changes

dmitry-gorokhov approved these changes Oct 14, 2024

View reviewed changes

xczhai force-pushed the xc/avx_fp16 branch 2 times, most recently from 93e1a24 to b9c9f4c Compare October 14, 2024 07:39

fix the comments

a9bab56

xczhai force-pushed the xc/avx_fp16 branch from b9c9f4c to a9bab56 Compare October 14, 2024 09:20

xczhai added 4 commits October 14, 2024 17:21

Merge branch 'master' into xc/avx_fp16

5b0cb74

Merge branch 'master' into xc/avx_fp16

c11bf92

Merge branch 'master' into xc/avx_fp16

0f571e8

fix test

0c9fd49

dmitry-gorokhov added this to the 2024.5 milestone Oct 15, 2024

dmitry-gorokhov self-assigned this Oct 15, 2024

dmitry-gorokhov added this pull request to the merge queue Oct 15, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2024

dmitry-gorokhov added this pull request to the merge queue Oct 15, 2024

Merged via the queue into openvinotoolkit:master with commit 9486b7d Oct 15, 2024
160 checks passed

dmitry-gorokhov deleted the xc/avx_fp16 branch October 15, 2024 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] attn supports f16 #26487

[CPU] attn supports f16 #26487

xczhai commented Sep 9, 2024 •

edited

Loading

xczhai commented Sep 27, 2024

luo-cheng2021 Sep 29, 2024

dmitry-gorokhov Oct 9, 2024

xczhai Oct 9, 2024

xczhai Oct 10, 2024 •

edited

Loading

dmitry-gorokhov Oct 14, 2024

xczhai Oct 14, 2024

yuxu42 commented Sep 30, 2024

dmitry-gorokhov commented Oct 9, 2024

xczhai commented Oct 12, 2024 •

edited

Loading

dmitry-gorokhov Oct 14, 2024

xczhai Oct 14, 2024

[CPU] attn supports f16 #26487

[CPU] attn supports f16 #26487

Conversation

xczhai commented Sep 9, 2024 • edited Loading

Details:

Tickets:

xczhai commented Sep 27, 2024

luo-cheng2021 Sep 29, 2024

Choose a reason for hiding this comment

dmitry-gorokhov Oct 9, 2024

Choose a reason for hiding this comment

xczhai Oct 9, 2024

Choose a reason for hiding this comment

xczhai Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

dmitry-gorokhov Oct 14, 2024

Choose a reason for hiding this comment

xczhai Oct 14, 2024

Choose a reason for hiding this comment

yuxu42 commented Sep 30, 2024

dmitry-gorokhov commented Oct 9, 2024

xczhai commented Oct 12, 2024 • edited Loading

dmitry-gorokhov Oct 14, 2024

Choose a reason for hiding this comment

xczhai Oct 14, 2024

Choose a reason for hiding this comment

xczhai commented Sep 9, 2024 •

edited

Loading

xczhai Oct 10, 2024 •

edited

Loading

xczhai commented Oct 12, 2024 •

edited

Loading