Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fixes for issue #15 -- ARM assembly alignment bug
This change reintroduces ARM assembly for the streaming hash with a fix for unaligned accesses. The HashXXX() functions for ARM are implemented with calls to the underlying blocks() assembler routine, and so are not performant for tiny input sizes (crossover is at around 16 bytes). Performance benchmarking from a Marvell Armada 3720 here: Before: ``` goos: linux goarch: arm pkg: github.com/dchest/siphash BenchmarkHash8 2000000 607 ns/op 13.17 MB/s BenchmarkHash16 2000000 768 ns/op 20.82 MB/s BenchmarkHash40 1000000 1208 ns/op 33.09 MB/s BenchmarkHash64 1000000 1658 ns/op 38.60 MB/s BenchmarkHash128 500000 2844 ns/op 45.00 MB/s BenchmarkHash1K 100000 19368 ns/op 52.87 MB/s BenchmarkHash1Kunaligned 100000 19348 ns/op 52.93 MB/s BenchmarkHash8K 10000 151758 ns/op 53.98 MB/s BenchmarkHash128_8 2000000 863 ns/op 9.27 MB/s BenchmarkHash128_16 1000000 1029 ns/op 15.55 MB/s BenchmarkHash128_40 1000000 1469 ns/op 27.22 MB/s BenchmarkHash128_64 1000000 1909 ns/op 33.52 MB/s BenchmarkHash128_128 500000 3084 ns/op 41.50 MB/s BenchmarkHash128_1K 100000 19458 ns/op 52.62 MB/s BenchmarkHash128_8K 10000 150805 ns/op 54.32 MB/s BenchmarkFull8 1000000 1128 ns/op 7.09 MB/s BenchmarkFull16 1000000 1282 ns/op 12.48 MB/s BenchmarkFull40 1000000 1281 ns/op 18.73 MB/s BenchmarkFull64 1000000 2236 ns/op 28.62 MB/s BenchmarkFull128 1000000 2237 ns/op 57.20 MB/s BenchmarkFull1K 100000 21799 ns/op 46.97 MB/s BenchmarkFull1Kunaligned 100000 21479 ns/op 47.67 MB/s BenchmarkFull8K 10000 164920 ns/op 49.67 MB/s BenchmarkFull128_8 1000000 2120 ns/op 3.77 MB/s BenchmarkFull128_16 1000000 2271 ns/op 7.04 MB/s BenchmarkFull128_40 1000000 2266 ns/op 10.59 MB/s BenchmarkFull128_64 500000 3251 ns/op 19.68 MB/s BenchmarkFull128_128 500000 3238 ns/op 39.53 MB/s BenchmarkFull128_1K 100000 22546 ns/op 45.42 MB/s BenchmarkFull128_8K 10000 166300 ns/op 49.26 MB/s PASS ``` After: ``` goos: linux goarch: arm pkg: github.com/dchest/siphash BenchmarkHash8 2000000 677 ns/op 11.81 MB/s BenchmarkHash16 2000000 737 ns/op 21.71 MB/s BenchmarkHash40 2000000 963 ns/op 41.54 MB/s BenchmarkHash64 1000000 1167 ns/op 54.82 MB/s BenchmarkHash128 1000000 1719 ns/op 74.45 MB/s BenchmarkHash1K 200000 9349 ns/op 109.52 MB/s BenchmarkHash1Kunaligned 200000 11115 ns/op 92.12 MB/s BenchmarkHash8K 20000 70468 ns/op 116.25 MB/s BenchmarkHash128_8 1000000 1133 ns/op 7.06 MB/s BenchmarkHash128_16 1000000 1202 ns/op 13.31 MB/s BenchmarkHash128_40 1000000 1437 ns/op 27.83 MB/s BenchmarkHash128_64 1000000 1631 ns/op 39.23 MB/s BenchmarkHash128_128 1000000 2183 ns/op 58.62 MB/s BenchmarkHash128_1K 200000 9795 ns/op 104.54 MB/s BenchmarkHash128_8K 20000 70894 ns/op 115.55 MB/s BenchmarkFull8 2000000 694 ns/op 11.52 MB/s BenchmarkFull16 2000000 760 ns/op 21.03 MB/s BenchmarkFull40 2000000 764 ns/op 31.40 MB/s BenchmarkFull64 1000000 1186 ns/op 53.96 MB/s BenchmarkFull128 1000000 1181 ns/op 108.35 MB/s BenchmarkFull1K 200000 9399 ns/op 108.94 MB/s BenchmarkFull1Kunaligned 200000 11186 ns/op 91.54 MB/s BenchmarkFull8K 20000 70458 ns/op 116.27 MB/s BenchmarkFull128_8 1000000 2005 ns/op 3.99 MB/s BenchmarkFull128_16 1000000 2066 ns/op 7.74 MB/s BenchmarkFull128_40 1000000 2076 ns/op 11.56 MB/s BenchmarkFull128_64 500000 2495 ns/op 25.64 MB/s BenchmarkFull128_128 500000 2492 ns/op 51.36 MB/s BenchmarkFull128_1K 200000 10705 ns/op 95.65 MB/s BenchmarkFull128_8K 20000 71909 ns/op 113.92 MB/s PASS ```
- Loading branch information