-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saturating truncation produces extra instructions #68466
Comments
@llvm/issue-subscribers-backend-x86
See: https://llvm.godbolt.org/z/4KdejfEsG
The following two functions:
produce the following:
The Discovered in rust-lang/portable-simd#369 (comment) |
Another example of extra instructions generated: https://llvm.godbolt.org/z/bEW6j4asW declare <4 x i32> @llvm.smax.v4i32(<4 x i32>, <4 x i32>)
declare <4 x i32> @llvm.smin.v4i32(<4 x i32>, <4 x i32>)
declare <8 x i32> @llvm.smax.v8i32(<8 x i32>, <8 x i32>)
declare <8 x i32> @llvm.smin.v8i32(<8 x i32>, <8 x i32>)
define <4 x i8> @saturate4(<4 x i32> %x) {
%1 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %x, <4 x i32> zeroinitializer)
%2 = tail call <4 x i32> @llvm.smin.v4i32(<4 x i32> %1, <4 x i32> <i32 255, i32 255, i32 255, i32 255>)
%3 = trunc <4 x i32> %2 to <4 x i8>
ret <4 x i8> %3
}
define <8 x i8> @saturate8(<8 x i32> %x) {
%1 = tail call <8 x i32> @llvm.smax.v8i32(<8 x i32> %x, <8 x i32> zeroinitializer)
%2 = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> %1, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>)
%3 = trunc <8 x i32> %2 to <8 x i8>
ret <8 x i8> %3
} .LCPI0_0:
.long 255 # 0xff
.long 255 # 0xff
.long 255 # 0xff
.long 255 # 0xff
saturate4: # @saturate4
pxor xmm1, xmm1
movdqa xmm2, xmm0
pcmpgtd xmm2, xmm1
pand xmm0, xmm2
movdqa xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [255,255,255,255]
movdqa xmm2, xmm1
pcmpgtd xmm2, xmm0
pand xmm0, xmm2
pandn xmm2, xmm1
por xmm0, xmm2
packuswb xmm0, xmm0
packuswb xmm0, xmm0
ret
saturate8: # @saturate8
packssdw xmm0, xmm1
packuswb xmm0, xmm0
ret |
Adds additional test coverage for Issue #68466
This patch closed #73424, which is also a missed-optimization case similar to #68466 on X86. ## Source Code ``` define void @trunc_sat_i8i16(ptr %x, ptr %y) { %1 = load <8 x i16>, ptr %x, align 16 %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>) %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>) %4 = trunc <8 x i16> %3 to <8 x i8> store <8 x i8> %4, ptr %y, align 8 ret void } ``` ## Before this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e16, m1, ta, ma vle16.v v8, (a0) li a0, -128 vmax.vx v8, v8, a0 li a0, 127 vmin.vx v8, v8, a0 vsetvli zero, zero, e8, mf2, ta, ma vnsrl.wi v8, v8, 0 vse8.v v8, (a1) ret ``` ## After this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e8, mf2, ta, ma vle16.v v8, (a0) csrwi vxrm, 0 vnclip.wi v8, v8, 0 vse8.v v8, (a1) ret ```
See: https://llvm.godbolt.org/z/4KdejfEsG
The following two functions:
produce the following:
The
saturate4
function produces extra min/max. I believe thetrunc
followed byshufflevector
is being optimized before the saturating truncation could be detected.Discovered in rust-lang/portable-simd#369 (comment)
The text was updated successfully, but these errors were encountered: