[CPU] add arithmetic_mode impl for bf16_emitters #27737

liubo-intel · 2024-11-26T02:52:15Z

Details:

add arithmetic_mode impl for bf16_emitters to fix the 'inf' out issue when input data is out of rang [bf16_min,bf16_max] during f32->bf16, which may lead to 'UNK' outputs in some LLM cases

Tickets:

CVS-157254

liubo-intel · 2024-11-26T02:56:00Z

Hi, @chenhu-wang and @dmitry-gorokhov : could you please help review this pr ? thanks

chenhu-wang · 2024-11-26T03:47:36Z

src/plugins/intel_cpu/src/emitters/plugin/x64/jit_bf16_emitters.hpp

+            Zmm bf16_max = Zmm(aux_vec_idxs[0]);
+            Zmm bf16_min = Zmm(aux_vec_idxs[1]);
+            h->uni_vmovups(bf16_max, table_val("bf16_max"));
+            h->uni_vmovups(bf16_min, table_val("bf16_min"));
+
+            Zmm truncated = Zmm(aux_vec_idxs[2]);
+            h->uni_vmaxps(truncated, in, bf16_min);
+            h->uni_vminps(truncated, truncated, bf16_max);


This is not a truncation behavior, it is saturation implementation by clap with [min, max], but named truncated.

Hi, @chenhu-wang : thanks for your comments, do you suggest use some kind of more efficient instruction set to solve this kind of problem?

I mean "truncation" mode is to cut off lower bits directly. "saturation" mode is to clap with [min, max]. They have different result. PR description/variable name and this implementation is not aligned. First of all, we need understand which mode we needed here and align on all platforms. By truncation, you can ref h->uni_vandps(aux, aux, table_val("mask_truncation_word")); in below lines.

I got your point, based on the customer model outputs, it seems we need this kind of "saturation" mode here.
and the new added functional tests also shows that "saturation" mode impl align with ov reference impl. Maybe I need to modify the PR description based on your suggestion.

chenhu-wang · 2024-11-28T02:36:43Z

@liubo-intel , You use fixed "saturation" mode on avx512_core_bf16 ISA, but on some ISA like avx2, fixed "truncation" mode is used. This is not aligned.
We have arithmetic_mode mode_ in jit_store_emitter. jit_uni_vcvtneps2bf16 is used in store f32 to bf16 case, where the mode info is not respect there. I think we should take mode_(truncation or saturation) into jit_uni_vcvtneps2bf16 as a field and convert according to the mode. We can set the mode "saturation" as default.

…mitter and store_emitter

liubo-intel requested review from a team as code owners November 26, 2024 02:52

github-actions bot added the category: CPU OpenVINO CPU plugin label Nov 26, 2024

chenhu-wang reviewed Nov 26, 2024

View reviewed changes

add truncation impl for vcvtneps2bf16 ISA

2761002

liubo-intel force-pushed the liubo/bf16_emitter_trunc_impl branch from 8f98525 to 99e994a Compare November 28, 2024 11:11

liubo-intel changed the title ~~[CPU] add truncation impl for vcvtneps2bf16 ISA~~ [CPU] add arithmetic_mode impl for bf16_emitters Nov 28, 2024

arithmetic_mode for bf16_emitters, and saturation as default

f0b3f10

liubo-intel force-pushed the liubo/bf16_emitter_trunc_impl branch from 99e994a to f0b3f10 Compare November 29, 2024 06:04

update jit_uni_vcvtneps2bf16 arithmetic_mode for convert_truncation_e…

d038a9a

…mitter and store_emitter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] add arithmetic_mode impl for bf16_emitters #27737

[CPU] add arithmetic_mode impl for bf16_emitters #27737

liubo-intel commented Nov 26, 2024 •

edited

Loading

liubo-intel commented Nov 26, 2024

chenhu-wang Nov 26, 2024

liubo-intel Nov 26, 2024

chenhu-wang Nov 26, 2024

liubo-intel Nov 26, 2024

chenhu-wang commented Nov 28, 2024

[CPU] add arithmetic_mode impl for bf16_emitters #27737

Are you sure you want to change the base?

[CPU] add arithmetic_mode impl for bf16_emitters #27737

Conversation

liubo-intel commented Nov 26, 2024 • edited Loading

Details:

Tickets:

liubo-intel commented Nov 26, 2024

chenhu-wang Nov 26, 2024

Choose a reason for hiding this comment

liubo-intel Nov 26, 2024

Choose a reason for hiding this comment

chenhu-wang Nov 26, 2024

Choose a reason for hiding this comment

liubo-intel Nov 26, 2024

Choose a reason for hiding this comment

chenhu-wang commented Nov 28, 2024

liubo-intel commented Nov 26, 2024 •

edited

Loading