Avoid stack allocation in xxhash64 #547

andygrove · 2024-06-10T13:57:58Z

What is the problem the feature request solves?

Here is the current xxhash64 implementation:

pub(crate) fn spark_compatible_xxhash64<T: AsRef<[u8]>>(data: T, seed: u64) -> u64 {
    // TODO: Rewrite with a stateless hasher to reduce stack allocation?
    let mut hasher = XxHash64::with_seed(seed);
    hasher.write(data.as_ref());
    hasher.finish()
}

As the TODO comment states, it may be better to implement without stack allocation.

Here is Spark's version for reference:

  public static long hashInt(int input, long seed) {
    long hash = seed + PRIME64_5 + 4L;
    hash ^= (input & 0xFFFFFFFFL) * PRIME64_1;
    hash = Long.rotateLeft(hash, 23) * PRIME64_2 + PRIME64_3;
    return fmix(hash);
  }

Describe the potential solution

No response

Additional context

No response

parthchandra · 2024-06-10T23:56:00Z

Are we expecting a performance improvement as a result? (I doubt stack allocations are a performance bottleneck)

advancedxy · 2024-06-12T02:53:35Z

Linking this to #517 too.

Are we expecting a performance improvement as a result? (I doubt stack allocations are a performance bottleneck)

+1. I think we may need to figure out the root cause first. It might that the implementation of XxHash64 introduces some performance bottleneck, rather the stack allocation. I can try to reproduce and check this issue this weekend if I have some spare time.

andygrove · 2024-06-12T03:53:10Z

Thanks @advancedxy. Yes, it definitely needs more investigation.

andygrove · 2024-06-14T17:16:15Z

Here are benchmark results comparing murmur3 to xxhash64. xxhash64 is 3x slower (not sure if that is the expectation) but what is more interesting is that there is a warning during warmup (Unable to complete 100 samples in 5.0s).

     Running benches/hash.rs (target/release/deps/hash-3beecac83c235dce)
Gnuplot not found, using plotters backend
hash/murmur3/8192       time:   [342.78 µs 345.32 µs 349.49 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking hash/xxhash64/8192: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.0s, enable flat sampling, or reduce sample count to 70.
hash/xxhash64/8192      time:   [992.21 µs 994.27 µs 996.88 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

andygrove · 2024-06-14T17:42:27Z

@advancedxy I am working on re-implementing this now in a simpler way as an experiement ... I will aim to have a PR up by end of day

parthchandra · 2024-06-15T00:24:01Z

xxhash64 is 3x slower (not sure if that is the expectation)

xxhash64 is by no means slower than murmur3. Here's a sample benchmark: https://lz4.github.io/lz4-java/1.3.0/xxhash-benchmark/

advancedxy · 2024-06-15T05:00:55Z

I am working on re-implementing this now in a simpler way as an experiement ... I will aim to have a PR up by end of day

Thanks for your effort.

Here are benchmark results comparing murmur3 to xxhash64. xxhash64 is 3x slower (not sure if that is the expectation)

I don't think this is expected as @parthchandra already pointed out xxhash64 should be much faster than murmur3. Even the TPC-H q8 is not related to XXHash64, we should still investigate it further and improve its performance. I can help with that, though it's not urgent then.

andygrove added enhancement New feature or request performance labels Jun 10, 2024

andygrove mentioned this issue Jun 10, 2024

chore: disable xxhash64 by default #548

Merged

andygrove mentioned this issue Jun 12, 2024

perf: Add criterion benchmark for xxhash64 function #560

Merged

andygrove mentioned this issue Jun 14, 2024

feat: Implement more efficient version of xxhash64 #575

Merged

andygrove closed this as completed in #575 Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid stack allocation in xxhash64 #547

Avoid stack allocation in xxhash64 #547

andygrove commented Jun 10, 2024

parthchandra commented Jun 10, 2024

advancedxy commented Jun 12, 2024

andygrove commented Jun 12, 2024

andygrove commented Jun 14, 2024

andygrove commented Jun 14, 2024

parthchandra commented Jun 15, 2024

advancedxy commented Jun 15, 2024

Avoid stack allocation in xxhash64 #547

Avoid stack allocation in xxhash64 #547

Comments

andygrove commented Jun 10, 2024

What is the problem the feature request solves?

Describe the potential solution

Additional context

parthchandra commented Jun 10, 2024

advancedxy commented Jun 12, 2024

andygrove commented Jun 12, 2024

andygrove commented Jun 14, 2024

andygrove commented Jun 14, 2024

parthchandra commented Jun 15, 2024

advancedxy commented Jun 15, 2024