Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitives specialization for Hasher #40

Closed
5 tasks done
ogxd opened this issue Dec 6, 2023 · 4 comments · Fixed by #42
Closed
5 tasks done

Primitives specialization for Hasher #40

ogxd opened this issue Dec 6, 2023 · 4 comments · Fixed by #42
Assignees

Comments

@ogxd
Copy link
Owner

ogxd commented Dec 6, 2023

Context

The Hasher trait exposes methods to hash primitives. Currently, we hash primitives by considering them all as slices of bytes. Hashing can be much performance if the type is known in advance (eg load primitive directly in SIMD vector).

Triggered by the following discussion: rust-lang/hashbrown#487

Goals

  • Hashing a primitive type is faster.
  • Hashing a primitive still follows the algorithm principles (and thus remains stable and passes SMHasher)

Todo

  • Add benchmark to test hashing on primitive, like u32
  • Implement all methods (write_u32, write_u64, ...)
    • Make it work for ARM
    • Make it work for x86
  • Publish benchmark results before/after on ARM/X86]
@ogxd ogxd self-assigned this Dec 6, 2023
@ogxd ogxd mentioned this issue Dec 6, 2023
3 tasks
@ogxd ogxd linked a pull request Dec 10, 2023 that will close this issue
@ogxd
Copy link
Owner Author

ogxd commented Dec 10, 2023

It seems the default write (non-specialized) wasn't that bad even on the smallest primitive types. On MacBook M1 pro, using write_u32 yields a +13% performance (which is still substantial).

Another interesting thing is that the hashset benchmark was biased in some cases. black_boxing the keys prevents compiler optimizations that made this bench biased.

@ogxd
Copy link
Owner Author

ogxd commented Dec 10, 2023

Current progress involves hashes that are stable in the context of the Hasher, however hashes for an u32 hashed via Hasher::write_u32 are not stable with hashes using the gxhash(&[u8], ...) method. I think this is acceptable because those are two very different contexts. SMHasher should still pass for both contexes.

@ogxd
Copy link
Owner Author

ogxd commented Dec 10, 2023

Fixed a SIGSEGV when passed [u8] is a null slice (not just an empty slice)

@ogxd ogxd closed this as completed in #42 Dec 10, 2023
@ogxd
Copy link
Owner Author

ogxd commented Dec 10, 2023

Merging and releasing 2.3.0

On both my ARM and X86 platforms, I get about -13% of hashing time for small inputs (u8, u16, u32, u64, u128 and signed counterparts). On my ARM PC, gxhash Hasher is now faster than ahash for such inputs. My on X86 PC, gxhash remain a bit slower for these inputs (about 10% slower). I have a doubt in ahash Hasher passing SMHasher quality test for such inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant