Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] add arithmetic_mode impl for bf16_emitters #27737

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

liubo-intel
Copy link
Contributor

@liubo-intel liubo-intel commented Nov 26, 2024

Details:

  • add arithmetic_mode impl for bf16_emitters to fix the 'inf' out issue when input data is out of rang [bf16_min,bf16_max] during f32->bf16, which may lead to 'UNK' outputs in some LLM cases

Tickets:

@liubo-intel liubo-intel requested review from a team as code owners November 26, 2024 02:52
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Nov 26, 2024
@liubo-intel
Copy link
Contributor Author

Hi, @chenhu-wang and @dmitry-gorokhov : could you please help review this pr ? thanks

Comment on lines 43 to 50
Zmm bf16_max = Zmm(aux_vec_idxs[0]);
Zmm bf16_min = Zmm(aux_vec_idxs[1]);
h->uni_vmovups(bf16_max, table_val("bf16_max"));
h->uni_vmovups(bf16_min, table_val("bf16_min"));

Zmm truncated = Zmm(aux_vec_idxs[2]);
h->uni_vmaxps(truncated, in, bf16_min);
h->uni_vminps(truncated, truncated, bf16_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a truncation behavior, it is saturation implementation by clap with [min, max], but named truncated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @chenhu-wang : thanks for your comments, do you suggest use some kind of more efficient instruction set to solve this kind of problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean "truncation" mode is to cut off lower bits directly. "saturation" mode is to clap with [min, max]. They have different result. PR description/variable name and this implementation is not aligned. First of all, we need understand which mode we needed here and align on all platforms. By truncation, you can ref h->uni_vandps(aux, aux, table_val("mask_truncation_word")); in below lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point, based on the customer model outputs, it seems we need this kind of "saturation" mode here.
and the new added functional tests also shows that "saturation" mode impl align with ov reference impl. Maybe I need to modify the PR description based on your suggestion.

@chenhu-wang
Copy link
Contributor

@liubo-intel , You use fixed "saturation" mode on avx512_core_bf16 ISA, but on some ISA like avx2, fixed "truncation" mode is used. This is not aligned.
We have arithmetic_mode mode_ in jit_store_emitter. jit_uni_vcvtneps2bf16 is used in store f32 to bf16 case, where the mode info is not respect there. I think we should take mode_(truncation or saturation) into jit_uni_vcvtneps2bf16 as a field and convert according to the mode. We can set the mode "saturation" as default.

@liubo-intel liubo-intel changed the title [CPU] add truncation impl for vcvtneps2bf16 ISA [CPU] add arithmetic_mode impl for bf16_emitters Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants