Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ASCII check #115

Merged
merged 1 commit into from
Sep 20, 2023
Merged

Simplify ASCII check #115

merged 1 commit into from
Sep 20, 2023

Conversation

rhpvorderman
Copy link
Collaborator

Unaligned loads perform well on x86_64

  • No need to keep different functions and files as the SSE2 specific code can be surrounded by compile guards

Recently I have been writing quite some vectorized code and I decided to update my very first attempt at the matter. This is certainly much simpler. I did a quick check and pointer types are signed by default. (At least on my platform, intptr_t is a long, not an unsigned one). So deducting from end_ptr as in this code will simply work.

Daniel Lemire did a test and found there is no difference between unaligned and aligned loads: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/. This was quite some time ago. I also did some reading lately and I found it confirmed that AMD and Intel specifically altered their architectures to make sure unaligned loads are just as fast. Data alignment is simply not an issue anymore for speed. Difference is not measurable. So unaligned loads are actually faster as you can start using vector instructions right away rather than having the overhead of an alignment loop first.

I did some quick testing and found no speed difference between this code and the old code. This will save quite some lines.

- Unaligned loads perform well on x86_64
- No need to keep different functions and files as the SSE2 specific
  code can be surrounded by compile guards
@marcelm
Copy link
Owner

marcelm commented Sep 20, 2023

Nice! I noticed this "#ifdef SSE2_ ... #endif while ..." pattern in the other PR and found it quite nice.

@marcelm marcelm merged commit 5c8b2e1 into marcelm:main Sep 20, 2023
14 checks passed
@rhpvorderman rhpvorderman deleted the simpleasciicheck branch September 20, 2023 12:38
@rhpvorderman
Copy link
Collaborator Author

Yes, sometimes even without vectors you want to do a unrolled loop that does multiple operations and one that does only one. This pattern helps a lot with that. Also getting rid of a loop control variable sometimes means faster execution times. So it is a big win all around, even without vectorization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants