Saturating truncation produces extra instructions #68466

calebzulawski · 2023-10-07T04:56:58Z

See: https://llvm.godbolt.org/z/4KdejfEsG

The following two functions:

declare <4 x i16> @llvm.smax.v4i16(<4 x i16>, <4 x i16>)
declare <4 x i16> @llvm.smin.v4i16(<4 x i16>, <4 x i16>)
declare <8 x i16> @llvm.smax.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.smin.v8i16(<8 x i16>, <8 x i16>)

define <4 x i8> @saturate4(<4 x i16> %x) {
  %1 = tail call <4 x i16> @llvm.smax.v4i16(<4 x i16> %x, <4 x i16> zeroinitializer)
  %2 = tail call <4 x i16> @llvm.smin.v4i16(<4 x i16> %1, <4 x i16> <i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <4 x i16> %2 to <4 x i8>
  ret <4 x i8> %3
}

define <8 x i8> @saturate8(<8 x i16> %x) {
  %1 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %x, <8 x i16> zeroinitializer)
  %2 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %1, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <8 x i16> %2 to <8 x i8>
  ret <8 x i8> %3
}

produce the following:

.LCPI0_0:
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .zero   2
        .zero   2
        .zero   2
        .zero   2
saturate4:                              # @saturate4
        pxor    xmm1, xmm1
        pmaxsw  xmm0, xmm1
        pminsw  xmm0, xmmword ptr [rip + .LCPI0_0]
        packuswb        xmm0, xmm0
        ret
saturate8:                              # @saturate8
        packuswb        xmm0, xmm0
        ret

The saturate4 function produces extra min/max. I believe the trunc followed by shufflevector is being optimized before the saturating truncation could be detected.

Discovered in rust-lang/portable-simd#369 (comment)

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-10-07T05:03:26Z

@llvm/issue-subscribers-backend-x86

See: https://llvm.godbolt.org/z/4KdejfEsG

The following two functions:

declare &lt;4 x i16&gt; @<!-- -->llvm.smax.v4i16(&lt;4 x i16&gt;, &lt;4 x i16&gt;)
declare &lt;4 x i16&gt; @<!-- -->llvm.smin.v4i16(&lt;4 x i16&gt;, &lt;4 x i16&gt;)
declare &lt;8 x i16&gt; @<!-- -->llvm.smax.v8i16(&lt;8 x i16&gt;, &lt;8 x i16&gt;)
declare &lt;8 x i16&gt; @<!-- -->llvm.smin.v8i16(&lt;8 x i16&gt;, &lt;8 x i16&gt;)

define &lt;4 x i8&gt; @<!-- -->saturate4(&lt;4 x i16&gt; %x) {
  %1 = tail call &lt;4 x i16&gt; @<!-- -->llvm.smax.v4i16(&lt;4 x i16&gt; %x, &lt;4 x i16&gt; zeroinitializer)
  %2 = tail call &lt;4 x i16&gt; @<!-- -->llvm.smin.v4i16(&lt;4 x i16&gt; %1, &lt;4 x i16&gt; &lt;i16 255, i16 255, i16 255, i16 255&gt;)
  %3 = trunc &lt;4 x i16&gt; %2 to &lt;4 x i8&gt;
  ret &lt;4 x i8&gt; %3
}

define &lt;8 x i8&gt; @<!-- -->saturate8(&lt;8 x i16&gt; %x) {
  %1 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smax.v8i16(&lt;8 x i16&gt; %x, &lt;8 x i16&gt; zeroinitializer)
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smin.v8i16(&lt;8 x i16&gt; %1, &lt;8 x i16&gt; &lt;i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255&gt;)
  %3 = trunc &lt;8 x i16&gt; %2 to &lt;8 x i8&gt;
  ret &lt;8 x i8&gt; %3
}

produce the following:

.LCPI0_0:
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .zero   2
        .zero   2
        .zero   2
        .zero   2
saturate4:                              # @<!-- -->saturate4
        pxor    xmm1, xmm1
        pmaxsw  xmm0, xmm1
        pminsw  xmm0, xmmword ptr [rip + .LCPI0_0]
        packuswb        xmm0, xmm0
        ret
saturate8:                              # @<!-- -->saturate8
        packuswb        xmm0, xmm0
        ret

The saturate4 function produces extra min/max. I believe the trunc followed by shufflevector is being optimized before the saturating truncation could be detected.

Discovered in rust-lang/portable-simd#369 (comment)

okaneco · 2023-10-08T10:55:55Z

Another example of extra instructions generated: https://llvm.godbolt.org/z/bEW6j4asW
Original Rust code: https://rust.godbolt.org/z/hjM3G4qxY

declare <4 x i32> @llvm.smax.v4i32(<4 x i32>, <4 x i32>)
declare <4 x i32> @llvm.smin.v4i32(<4 x i32>, <4 x i32>)
declare <8 x i32> @llvm.smax.v8i32(<8 x i32>, <8 x i32>)
declare <8 x i32> @llvm.smin.v8i32(<8 x i32>, <8 x i32>)

define <4 x i8> @saturate4(<4 x i32> %x) {
  %1 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %x, <4 x i32> zeroinitializer)
  %2 = tail call <4 x i32> @llvm.smin.v4i32(<4 x i32> %1, <4 x i32> <i32 255, i32 255, i32 255, i32 255>)
  %3 = trunc <4 x i32> %2 to <4 x i8>
  ret <4 x i8> %3
}

define <8 x i8> @saturate8(<8 x i32> %x) {
  %1 = tail call <8 x i32> @llvm.smax.v8i32(<8 x i32> %x, <8 x i32> zeroinitializer)
  %2 = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> %1, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>)
  %3 = trunc <8 x i32> %2 to <8 x i8>
  ret <8 x i8> %3
}

.LCPI0_0:
        .long   255                             # 0xff
        .long   255                             # 0xff
        .long   255                             # 0xff
        .long   255                             # 0xff
saturate4:                              # @saturate4
        pxor    xmm1, xmm1
        movdqa  xmm2, xmm0
        pcmpgtd xmm2, xmm1
        pand    xmm0, xmm2
        movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [255,255,255,255]
        movdqa  xmm2, xmm1
        pcmpgtd xmm2, xmm0
        pand    xmm0, xmm2
        pandn   xmm2, xmm1
        por     xmm0, xmm2
        packuswb        xmm0, xmm0
        packuswb        xmm0, xmm0
        ret
saturate8:                              # @saturate8
        packssdw        xmm0, xmm1
        packuswb        xmm0, xmm0
        ret

Adds additional test coverage for Issue #68466

This patch closed #73424, which is also a missed-optimization case similar to #68466 on X86. ## Source Code ``` define void @trunc_sat_i8i16(ptr %x, ptr %y) { %1 = load <8 x i16>, ptr %x, align 16 %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>) %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>) %4 = trunc <8 x i16> %3 to <8 x i8> store <8 x i8> %4, ptr %y, align 8 ret void } ``` ## Before this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e16, m1, ta, ma vle16.v v8, (a0) li a0, -128 vmax.vx v8, v8, a0 li a0, 127 vmin.vx v8, v8, a0 vsetvli zero, zero, e8, mf2, ta, ma vnsrl.wi v8, v8, 0 vse8.v v8, (a1) ret ``` ## After this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e8, mf2, ta, ma vle16.v v8, (a0) csrwi vxrm, 0 vnclip.wi v8, v8, 0 vse8.v v8, (a1) ret ```

github-actions bot added the new issue label Oct 7, 2023

EugeneZelenko added backend:X86 missed-optimization and removed new issue labels Oct 7, 2023

RKSimon self-assigned this Oct 8, 2023

RKSimon added a commit that referenced this issue Nov 1, 2023

[X86] Add fpclamptosat to vXi8 test coverage

dc5e6e4

Adds additional test coverage for Issue #68466

RKSimon closed this as completed in f471f6f Nov 1, 2023

boomshroom mentioned this issue Nov 26, 2023

[RISCV] Vector saturating truncation not using VNCLIP #73424

Closed

sun-jacobi mentioned this issue Dec 12, 2023

[RISCV][ISel] Remove redundant min/max in saturating truncation #75145

Merged

sun-jacobi mentioned this issue Mar 20, 2024

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saturating truncation produces extra instructions #68466

Saturating truncation produces extra instructions #68466

calebzulawski commented Oct 7, 2023

llvmbot commented Oct 7, 2023

okaneco commented Oct 8, 2023

Saturating truncation produces extra instructions #68466

Saturating truncation produces extra instructions #68466

Comments

calebzulawski commented Oct 7, 2023

llvmbot commented Oct 7, 2023

okaneco commented Oct 8, 2023