-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] add arithmetic_mode impl for bf16_emitters #27737
base: master
Are you sure you want to change the base?
[CPU] add arithmetic_mode impl for bf16_emitters #27737
Conversation
Hi, @chenhu-wang and @dmitry-gorokhov : could you please help review this pr ? thanks |
Zmm bf16_max = Zmm(aux_vec_idxs[0]); | ||
Zmm bf16_min = Zmm(aux_vec_idxs[1]); | ||
h->uni_vmovups(bf16_max, table_val("bf16_max")); | ||
h->uni_vmovups(bf16_min, table_val("bf16_min")); | ||
|
||
Zmm truncated = Zmm(aux_vec_idxs[2]); | ||
h->uni_vmaxps(truncated, in, bf16_min); | ||
h->uni_vminps(truncated, truncated, bf16_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a truncation behavior, it is saturation implementation by clap with [min, max], but named truncated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @chenhu-wang : thanks for your comments, do you suggest use some kind of more efficient instruction set to solve this kind of problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean "truncation" mode is to cut off lower bits directly. "saturation" mode is to clap with [min, max]. They have different result. PR description/variable name and this implementation is not aligned. First of all, we need understand which mode we needed here and align on all platforms. By truncation, you can ref h->uni_vandps(aux, aux, table_val("mask_truncation_word")); in below lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point, based on the customer model outputs, it seems we need this kind of "saturation" mode here.
and the new added functional tests also shows that "saturation" mode impl align with ov reference impl. Maybe I need to modify the PR description based on your suggestion.
@liubo-intel , You use fixed "saturation" mode on avx512_core_bf16 ISA, but on some ISA like avx2, fixed "truncation" mode is used. This is not aligned. |
8f98525
to
99e994a
Compare
99e994a
to
f0b3f10
Compare
…mitter and store_emitter
Details:
Tickets: