WebAssembly: Suboptimal codegen of combined SIMD operations #57182

alexcrichton · 2022-08-16T20:14:23Z

Originally reported at rust-lang/stdarch#1322 it appears that combining the codegen for these three wasm simd instructions results in a scalarized lowering rather than using the instructions themselves:

i16x8.extend_low_i8x16_u
i32x4.extend_low_i16x8_u
f32x4.convert_i32x4_u

This godbolt example has the Rust source code, the generated WebAssembly today, and the corresponding optimized LLVM IR that is being lowered.

cc @tlively

The text was updated successfully, but these errors were encountered:

alexcrichton · 2022-08-16T20:15:43Z

Er sorry should include the actual sources here. Input LLVM is:

target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-unknown"

define dso_local <4 x i32> @convert(<4 x i32> %x) unnamed_addr #0 {
  %0 = bitcast <4 x i32> %x to <16 x i8>
  %1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = zext <8 x i8> %1 to <8 x i16>
  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %4 = uitofp <4 x i16> %3 to <4 x float>
  %5 = bitcast <4 x float> %4 to <4 x i32>
  ret <4 x i32> %5
}

attributes #0 = { mustprogress nofree nosync nounwind willreturn "target-cpu"="generic" "target-features"="+simd128" }

and the output wasm is:

convert:
        local.get       0
        i16x8.extend_low_i8x16_u
        local.tee       0
        i16x8.extract_lane_u    0
        f32.convert_i32_u
        f32x4.splat
        local.get       0
        i16x8.extract_lane_u    1
        f32.convert_i32_u
        f32x4.replace_lane      1
        local.get       0
        i16x8.extract_lane_u    2
        f32.convert_i32_u
        f32x4.replace_lane      2
        local.get       0
        i16x8.extract_lane_u    3
        f32.convert_i32_u
        f32x4.replace_lane      3
        end_function

llvmbot · 2022-08-16T21:09:10Z

@llvm/issue-subscribers-backend-webassembly

tlively · 2022-08-16T22:34:33Z

Also cc @dtig, since I know she was tracking SIMD optimization issues.

lukel97 · 2023-01-03T16:18:42Z

I think the issue here is that it's not selecting TxN.extend_low.SxM_u instructions whenever there's no explicit zero_extend SDNode:

define <4 x i32> @bad(<4 x i32> %x) {
  %1 = bitcast <4 x i32> %x to <8 x i16>
  %2 = shufflevector <8 x i16> %1, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %3 = uitofp <4 x i16> %2 to <4 x float>
  %4 = bitcast <4 x float> %3 to <4 x i32>
  ret <4 x i32> %4
}

define <4 x i32> @good(<4 x i32> %x) {
  %1 = bitcast <4 x i32> %x to <8 x i16>
  %2 = shufflevector <8 x i16> %1, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %3 = zext <4 x i16> %2 to <4 x i32>
  %4 = uitofp <4 x i32> %3 to <4 x float>
  %5 = bitcast <4 x float> %4 to <4 x i32>
  ret <4 x i32> %5
}

These test cases should be updated in a following patch once fixed Part of #57182

These test cases should be updated in a following patch once fixed Part of llvm/llvm-project#57182

During DAG legalization, {u,s}itofp instructions on v2i8, v2i16, v4i8 and v4i16 types ended up being legalized into scalar instructions, when they could just be extended to v2i32/v4i32 instead. Fixes llvm/llvm-project#57182 Differential Revision: https://reviews.llvm.org/D140916

github-actions bot added the new issue label Aug 16, 2022

alexcrichton mentioned this issue Aug 16, 2022

Problem with intrinsic f32x4_convert_u32x4 in wasm rust-lang/stdarch#1322

Open

EugeneZelenko added backend:WebAssembly and removed new issue labels Aug 16, 2022

lukel97 added a commit that referenced this issue Jan 3, 2023

[WebAssembly][NFC] Add test case for {u,s}itofp on SIMD types

2671aa7

These test cases should be updated in a following patch once fixed Part of #57182

CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 4, 2023

[WebAssembly][NFC] Add test case for {u,s}itofp on SIMD types

d36c232

These test cases should be updated in a following patch once fixed Part of llvm/llvm-project#57182

lukel97 closed this as completed in fb66026 Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebAssembly: Suboptimal codegen of combined SIMD operations #57182

WebAssembly: Suboptimal codegen of combined SIMD operations #57182

alexcrichton commented Aug 16, 2022

alexcrichton commented Aug 16, 2022

llvmbot commented Aug 16, 2022

tlively commented Aug 16, 2022

lukel97 commented Jan 3, 2023

WebAssembly: Suboptimal codegen of combined SIMD operations #57182

WebAssembly: Suboptimal codegen of combined SIMD operations #57182

Comments

alexcrichton commented Aug 16, 2022

alexcrichton commented Aug 16, 2022

llvmbot commented Aug 16, 2022

tlively commented Aug 16, 2022

lukel97 commented Jan 3, 2023