Skip to content

Commit

Permalink
Update README with new benchmark numbers
Browse files Browse the repository at this point in the history
  • Loading branch information
as-com committed Jan 2, 2021
1 parent 0f1026f commit 5125a63
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 36 deletions.
88 changes: 58 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,21 @@ varint-simd
[![Continuous integration](https://github.com/as-com/varint-simd/workflows/Continuous%20integration/badge.svg)](https://github.com/as-com/varint-simd/actions?query=workflow%3A%22Continuous+integration%22)

varint-simd is a fast SIMD-accelerated [variable-length integer](https://developers.google.com/protocol-buffers/docs/encoding)
encoder and decoder written in Rust. It is intended for use in implementations of Protocol Buffers (protobuf), Apache
Avro, and similar serialization formats.
and [LEB128](https://en.wikipedia.org/wiki/LEB128) encoder and decoder written in Rust. It combines a largely branchless
design with compile-time specialization to achieve gigabytes per second of throughput encoding and decoding individual
integers on commodity hardware. An interface to decode multiple adjacent variable-length integers is also provided
to achieve even higher throughput, reaching [over a billion decoded 8-bit integers per second](#benchmarks) on a single
thread.

This library currently targets a minimum of x86_64 processors with support for SSSE3 (Intel Core/AMD Bulldozer or
newer), with optional optimizations for processors supporting POPCNT, LZCNT, BMI2, and/or AVX2.
newer), with optional optimizations for processors supporting POPCNT, LZCNT, BMI2, and/or AVX2. It is intended for use
in implementations of Protocol Buffers (protobuf), Apache Avro, and similar serialization formats, but likely has many
other applications.

## Usage
**Important:** For optimal performance, ensure the Rust compiler has an appropriate `target-cpu` setting. An example is
provided in [`.cargo/config`](.cargo/config), but you may need to edit the file to specify the oldest CPUs your compiled
binaries will support.
**Important:** Ensure the Rust compiler has an appropriate `target-cpu` setting. An example is provided in
[`.cargo/config`](.cargo/config), but you may need to edit the file to specify the oldest CPUs your compiled
binaries will support. Your project will not compile unless this is set correctly.

The `native-optimizations` feature should be enabled if and only if `target-cpu` is set to `native`, such as in the
example. This enables some extra optimizations if suitable for your specific CPU.
Expand Down Expand Up @@ -69,41 +74,60 @@ For more details, please see [the source code for these benchmarks](benches/vari
![benchmark graph](images/benchmark.png)

#### Decode
| | varint-simd unsafe | varint-simd safe | [rustc](https://github.com/nnethercote/rust/blob/0f6f2d681b39c5f95459cd09cb936b6ceb27cd82/compiler/rustc_serialize/src/leb128.rs) | [integer-encoding-rs](https://github.com/dermesser/integer-encoding-rs) | [prost](https://github.com/danburkert/prost) |
| -- | -- | -- | -- | -- | -- |
| `u8` | **1.85 ns** | **2.80 ns** | 7.23 ns | 7.18 ns | 70.6 ns |
| `u16` | **1.95 ns** | **2.78 ns** | 5.54 ns | 7.17 ns | 71.5 ns |
| `u32` | **2.41 ns** | **3.27 ns** | 7.35 ns | 7.41 ns | 73.6 ns |
| `u64` | **3.65 ns** | **4.15 ns** | 11.0 ns | 15.2 ns | 71.9 ns |
**All numbers are in millions of integers per second.**

| | varint-simd unsafe | varint-simd safe | [rustc](https://github.com/nnethercote/rust/blob/0f6f2d681b39c5f95459cd09cb936b6ceb27cd82/compiler/rustc_serialize/src/leb128.rs) | [integer-encoding-rs](https://github.com/dermesser/integer-encoding-rs) | [prost](https://github.com/danburkert/prost) |
|-------|------------------------|----------------------|--------|---------------------|--------|
| `u8` | **554.81** | **283.26** | 131.71 | 116.59 | 131.42 |
| `u16` | **493.96** | **349.74** | 168.09 | 121.35 | 157.68 |
| `u32` | **482.95** | **332.11** | 191.37 | 120.16 | 196.05 |
| `u64` | **330.86** | **277.65** | 82.315 | 80.328 | 97.585 |

| | varint-simd 2x | varint-simd 4x | varint-simd 8x |
|-------|----------------|----------------|----------------|
| `u8` | 658.52 | 644.36 | 896.32 |
| `u16` | 547.39 | 540.93 | |
| `u32` | 688.11 | | |

#### Encode
| | varint-simd | rustc | integer-encoding-rs | prost |
| -- | -- | -- | -- | -- |
| `u8` | **2.50 ns** | 5.20 ns | 6.24 ns | 10.5 ns |
| `u16` | **2.65 ns** | 5.47 ns | 6.63 ns | 11.5 ns |
| `u32` | **2.96 ns** | 6.43 ns | 7.74 ns | 13.7 ns |
| `u64` | **3.85 ns** | 14.1 ns | 13.0 ns | 21.8 ns |

| | varint-simd | rustc | integer-encoding-rs | prost |
|-------|-----------------|--------|---------------------|--------|
| `u8` | **383.01** | 214.05 | 126.66 | 93.617 |
| `u16` | **341.25** | 181.18 | 126.79 | 85.014 |
| `u32` | **360.87** | 157.95 | 125.00 | 77.402 |
| `u64` | **303.72** | 72.660 | 78.153 | 46.456 |

### AMD Ryzen 5 2600X @ 4.125 GHz "Zen+"
#### Decode
| | varint-simd unsafe | varint-simd safe | rustc | integer-encoding-rs | prost |
| -- | -- | -- | -- | -- | -- |
| `u8` | **2.62 ns** | **3.66 ns** | 7.57 ns | 8.27 ns | 37.6 ns |
| `u16` | **3.14 ns** | **3.98 ns** | 6.57 ns | 7.56 ns | 36.7 ns |
| `u32` | **4.36 ns** | **4.83 ns** | 6.57 ns | 7.98 ns | 36.2 ns |
| `u64` | **6.97 ns** | **7.12 ns** | 12.5 ns | 13.2 ns | 40.3 ns |

| | varint-simd unsafe | varint-simd safe | rustc | integer-encoding-rs | prost |
|-------|------------------------|----------------------|--------|---------------------|--------|
| `u8` | **537.51** | **304.85** | 152.35 | 138.54 | 124.44 |
| `u16` | **403.39** | **300.68** | 170.31 | 156.06 | 147.83 |
| `u32` | **293.88** | **235.92** | 160.48 | 159.13 | 150.05 |
| `u64` | **229.28** | **193.28** | 75.822 | 85.010 | 83.407 |


| | varint-simd 2x | varint-simd 4x | varint-simd 8x |
|-------|----------------|----------------|----------------|
| `u8` | 943.75 | 808.45 | 1,106.50 |
| `u16` | 721.01 | 632.03 | |
| `u32` | 459.77 | | |

#### Encode
| | varint-simd | rustc | integer-encoding-rs | prost |
| -- | -- | -- | -- | -- |
| `u8` | **3.94 ns** | 4.64 ns | 7.65 ns | 10.4 ns |
| `u16` | **4.23 ns** | 6.03 ns | 7.51 ns | 10.6 ns |
| `u32` | **4.62 ns** | 9.33 ns | 8.94 ns | 12.9 ns |
| `u64` | **5.78 ns** | 19.3 ns | 14.1 ns | 21.5 ns |

| | varint-simd | rustc | integer-encoding-rs | prost |
|-------|-----------------|--------|---------------------|--------|
| `u8` | **362.97** | 211.07 | 142.16 | 98.237 |
| `u16` | **334.10** | 172.09 | 140.78 | 96.480 |
| `u32` | **288.19** | 101.56 | 126.27 | 82.210 |
| `u64` | **207.89** | 52.515 | 79.375 | 48.088 |

## TODO
* Encoding multiple values at once
* Faster decode for two `u64` values with AVX2 (currently fairly slow)
* Improve performance of "safe" interface
* Parallel ZigZag decode/encode
* Support for ARM NEON
* Fallback scalar implementation
Expand All @@ -125,6 +149,10 @@ specify this feature manually.
Library crates **should not** enable this feature by default. A separate feature flag should be provided to enable this
feature in this crate.

## Previous Work

* Daniel Lemire, et. al. - Stream VByte: Faster Byte-Oriented Integer Compression: https://arxiv.org/abs/1709.08990

## License

Licensed under either of
Expand Down
Binary file modified images/benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 7 additions & 6 deletions src/decode/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,8 @@ pub unsafe fn decode_unsafe<T: VarIntTarget>(bytes: *const u8) -> (T, u8) {
(num, len as u8)
}

/// Decodes two varints simultaneously. Target types must fit within 16 bytes when varint encoded.
/// Requires SSSE3 support.
/// Decodes two adjacent varints simultaneously. Target types must fit within 16 bytes when varint
/// encoded. Requires SSSE3 support.
///
/// For example, it is permissible to decode `u32` and `u32`, and `u64` and `u32`, but it is not
/// possible to decode two `u64` values with this function simultaneously.
Expand Down Expand Up @@ -450,8 +450,8 @@ pub unsafe fn decode_two_wide_unsafe<T: VarIntTarget, U: VarIntTarget>(
(first_num, second_num, first_len as u8, second_len as u8)
}

/// Decodes four varints simultaneously. Target types must fit within 16 bytes when varint encoded.
/// Requires SSSE3 support.
/// Decodes four adjacent varints simultaneously. Target types must fit within 16 bytes when varint
/// encoded. Requires SSSE3 support.
///
/// Returns a tuple containing the four encoded values, followed by the number of bytes read for
/// each encoded value, followed by a boolean indicator for whether the length values may be
Expand Down Expand Up @@ -687,8 +687,8 @@ unsafe fn decode_four_u16_unsafe<
)
}

/// Decodes four varints into u8's simultaneously. Requires SSSE3 support. **Does not perform
/// overflow checking and may produce incorrect output.**
/// Decodes four adjacent varints into u8's simultaneously. Requires SSSE3 support. **Does not
/// perform overflow checking and may produce incorrect output.**
///
/// Returns a tuple containing an array of decoded values, and the total number of bytes read.
///
Expand All @@ -702,6 +702,7 @@ unsafe fn decode_four_u16_unsafe<
/// be shorter than expected. Caution is encouraged when using this function.
#[inline]
#[cfg(any(target_feature = "ssse3", doc))]
#[cfg_attr(rustc_nightly, doc(cfg(target_feature = "ssse3")))]
pub unsafe fn decode_eight_u8_unsafe(bytes: *const u8) -> ([u8; 8], u8) {
let b = _mm_loadu_si128(bytes as *const __m128i);

Expand Down

0 comments on commit 5125a63

Please sign in to comment.