Annotating a function with `#[inline]` leads to 10x slowdown in criterion #649

CeleritasCelery · 2023-02-01T18:30:24Z

I am encountering an odd issue where adding the #[inline] annotation to a public function leads to it performing 10x worse in the criterion benchmark. This function is a SIMD implementation of character counting. Interestingly I only see the major slow down when benchmarking the SIMD version. If I switch it out for the scalar version, the inline annotation has no effect. I am also not able to reproduce this issue on x86_64, only with Aarch64 (Apple Silicon).

Here is the repo/branch used to reproduce the issue.

https://github.com/CeleritasCelery/str_indices/tree/benchmark_issue

Clone that branch and run cargo criterion

My first question is, how could I generate the assembly for the benchmarks? I want to look at the two version to see what is actually going on. My best guess is that this is a codegen issue.

Do you have any ideas as to why I might be seeing this? I couldn't find any other instances of this in the issue tracker.

I am using the Rust 1.67 and criterion 0.4.0

cessen/str_indices#10

The text was updated successfully, but these errors were encountered:

workingjubilee · 2023-02-03T03:24:30Z

Normally rustc supports --emit=asm for this case but it's hard to support that for cargo commands so most simply do not. So your choices are finding a way to avoid criterion yet generate the same result, or finding the object file that criterion creates and using objdump on it.

saethlin · 2023-02-03T04:37:33Z

I'm on x86_64. I cannot reproduce a 10x difference, but I can reproduce a smaller difference. I'll post the output of criterion below, with the unimportant lines deleted. The inconsistent indentation is from criterion.

By default I see this:

chars::count/en_10000   time:   [201.76 ns 202.10 ns 202.48 ns]
chars::count_inline/en_10000
                        time:   [311.65 ns 311.99 ns 312.36 ns]

Adding codegen-units = 1 to [profile.release] I get this:

chars::count/en_10000   time:   [199.83 ns 199.99 ns 200.16 ns]
chars::count_inline/en_10000
                        time:   [192.16 ns 192.28 ns 192.41 ns]

Adding codegen-units = 1 and lto = "fat" I get this:

chars::count/en_10000   time:   [309.54 ns 309.66 ns 309.82 ns]
chars::count_inline/en_10000
                        time:   [198.52 ns 198.68 ns 198.83 ns]

The sequence of instructions in the tiny loop that's the target of the benchmark is exactly the same in all cases. I think this benchmark is highly sensitive to some aspect that LLVM doesn't or can't control. The alignment of the loop seems likely.

This is a common problem in microbenchmarking, and I am not aware of any good solutions to it. Emery Berger did a project called Stabilizer which doesn't work anymore (it's tightly coupled to LLVM internals and an academic doesn't have the time to keep up with LLVM versions), but the explanation of the problem is pretty good: https://youtu.be/r-TLSBdHe1A There is a GitHub repo and a paper, they're pretty easy to find if you want to learn more.

CeleritasCelery · 2023-02-03T05:48:41Z

closed in favor of rust-lang/rust#107617

This was referenced Feb 2, 2023

Add Aarch64 SIMD support cessen/str_indices#12

Merged

Adding #[inline] to function results in 10x slowdown when used in another crate rust-lang/rust#107617

Closed

CeleritasCelery closed this as completed Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotating a function with `#[inline]` leads to 10x slowdown in criterion #649

Annotating a function with `#[inline]` leads to 10x slowdown in criterion #649

CeleritasCelery commented Feb 1, 2023 •

edited

Loading

workingjubilee commented Feb 3, 2023

saethlin commented Feb 3, 2023

CeleritasCelery commented Feb 3, 2023

Annotating a function with #[inline] leads to 10x slowdown in criterion #649

Annotating a function with #[inline] leads to 10x slowdown in criterion #649

Comments

CeleritasCelery commented Feb 1, 2023 • edited Loading

workingjubilee commented Feb 3, 2023

saethlin commented Feb 3, 2023

CeleritasCelery commented Feb 3, 2023

Annotating a function with `#[inline]` leads to 10x slowdown in criterion #649

Annotating a function with `#[inline]` leads to 10x slowdown in criterion #649

CeleritasCelery commented Feb 1, 2023 •

edited

Loading