-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotating a function with #[inline]
leads to 10x slowdown in criterion
#649
Comments
Normally rustc supports |
I'm on x86_64. I cannot reproduce a 10x difference, but I can reproduce a smaller difference. I'll post the output of criterion below, with the unimportant lines deleted. The inconsistent indentation is from criterion. By default I see this:
Adding
Adding
The sequence of instructions in the tiny loop that's the target of the benchmark is exactly the same in all cases. I think this benchmark is highly sensitive to some aspect that LLVM doesn't or can't control. The alignment of the loop seems likely. This is a common problem in microbenchmarking, and I am not aware of any good solutions to it. Emery Berger did a project called Stabilizer which doesn't work anymore (it's tightly coupled to LLVM internals and an academic doesn't have the time to keep up with LLVM versions), but the explanation of the problem is pretty good: https://youtu.be/r-TLSBdHe1A There is a GitHub repo and a paper, they're pretty easy to find if you want to learn more. |
closed in favor of rust-lang/rust#107617 |
I am encountering an odd issue where adding the
#[inline]
annotation to a public function leads to it performing 10x worse in the criterion benchmark. This function is a SIMD implementation of character counting. Interestingly I only see the major slow down when benchmarking the SIMD version. If I switch it out for the scalar version, the inline annotation has no effect. I am also not able to reproduce this issue on x86_64, only with Aarch64 (Apple Silicon).Here is the repo/branch used to reproduce the issue.
https://github.com/CeleritasCelery/str_indices/tree/benchmark_issue
Clone that branch and run
cargo criterion
My first question is, how could I generate the assembly for the benchmarks? I want to look at the two version to see what is actually going on. My best guess is that this is a codegen issue.
Do you have any ideas as to why I might be seeing this? I couldn't find any other instances of this in the issue tracker.
I am using the Rust
1.67
and criterion0.4.0
cessen/str_indices#10
The text was updated successfully, but these errors were encountered: