Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison benchmarks and comparison in Readme #113

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ license = "MIT"
[dev-dependencies]
criterion = "0.4.0"
compiletest_rs = "0.10.0"
heapless = "0.7.16"

[features]
default = ["alloc"]
Expand Down
44 changes: 41 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Ringbuffer

![Github Workflows](https://img.shields.io/github/actions/workflow/status/NULLx76/ringbuffer/rust.yml?style=for-the-badge)
[![Docs.rs](https://img.shields.io/badge/docs.rs-ringbuffer-66c2a5?style=for-the-badge&labelColor=555555&logoColor=white&logo=)](https://docs.rs/ringbuffer)
[![Crates.io](https://img.shields.io/crates/v/ringbuffer?logo=rust&style=for-the-badge)](https://crates.io/crates/ringbuffer)
Expand Down Expand Up @@ -48,11 +49,48 @@ fn main() {

```

# Comparison of ringbuffer types

| type | heap allocated | growable | size must be power of 2 | requires alloc[^2] |
|-----------------------------------------------|----------------|----------|-------------------------|--------------------|
| `AllocRingBuffer<T, PowerOfTwo>` | yes | no | yes | yes |
| `AllocRingBuffer<T, NonPowerOfTwo>` | yes | no | no | yes |
| `GrowableAllocRingBuffer<T>` | yes | yes | no | yes |
| `ConstGenericRingBuffer<T, const CAP: usize>` | no | no | no[^1] | no |

[^1]: Using a size that is not a power of 2 will be ~3x slower.
[^2]: All ringbuffers are `no_std`, but some require an allocator to be available.

## Comparison of other ringbuffers and ringbuffer-like datastructures

We ran a benchmark, pushing `16 384` elements to a ringbuffer with a capacity of `1024` (where it was possible to
configure one) to compare
`ringbuffer` to a few common alternatives.
The outcomes show that using the [`ConstGenericRingBuffer`] is about 23 times faster than using an `std::channel` (
although `ringbuffer` doesn't give the thread safety a channel does).
A maybe more reasonable comparison is to an `std::VecDeque` and `heapless::Deque`,
where ringbuffer is slightly faster as well (among 100 measurements).

| implementation | time (95% confidence interval, \[lower estimate upper\]) |
|---------------------------------------|----------------------------------------------------------|
| `std::Vec` | `[13.190 ms 13.206 ms 13.223 ms]` |
| `std::LinkedList` | `[225.64 µs 228.09 µs 231.06 µs]` |
| `std::channel` | `[174.86 µs 175.41 µs 176.30 µs]` |
| `std::VecDeque` (growable ringbuffer) | `[33.894 µs 33.934 µs 33.974 µs]` |
| `AllocRingBuffer` | `[30.382 µs 30.451 µs 30.551 µs]` |
| `heapless::Deque` | `[16.260 µs 16.464 µs 16.748 µs]` |
| `ConstGenericRingBuffer` | `[13.685 µs 13.712 µs 13.743 µs]` |

Note that none of the alternatives to `RingBuffer` have the exact same behavior to `RingBuffer`. All `std` datastructures
compared here can grow unbounded (though in benchmarks they weren't filled over `65 536` elements).

`heapless::Deque` doesn't drop old items like `ringbuffer` does when the deque is full. Instead, new items aren't let in on push operations.

# Features

| name | default | description |
|-------|---------|--------------------------------------------------------------------------------------------------------------|
| alloc | ✓ | Disable this feature to remove the dependency on alloc. Disabling this feature makes `ringbuffer` `no_std`. |
| name | default | description |
|-------|---------|------------------------------------------------------------------------------------------------------------|
| alloc | ✓ | Disable this feature to remove the dependency on alloc. |

# License

Expand Down
13 changes: 8 additions & 5 deletions benches/bench.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use crate::comparison::comparison_benches;
use criterion::{black_box, criterion_group, criterion_main, Bencher, Criterion};
use ringbuffer::{AllocRingBuffer, ConstGenericRingBuffer, RingBuffer};

mod comparison;

fn benchmark_push<T: RingBuffer<i32>, F: Fn() -> T>(b: &mut Bencher, new: F) {
b.iter(|| {
let mut rb = new();
Expand Down Expand Up @@ -89,7 +92,7 @@ macro_rules! generate_benches {

fn benchmark_non_power_of_two<const L: usize>(b: &mut Bencher) {
b.iter(|| {
let mut rb = AllocRingBuffer::with_capacity_non_power_of_two(L);
let mut rb = AllocRingBuffer::with_capacity_power_of_2(L);

for i in 0..1_000_000 {
rb.push(i);
Expand All @@ -111,7 +114,7 @@ fn criterion_benchmark(c: &mut Criterion) {
c,
AllocRingBuffer,
i32,
with_capacity,
new,
benchmark_push,
16,
1024,
Expand All @@ -135,7 +138,7 @@ fn criterion_benchmark(c: &mut Criterion) {
c,
AllocRingBuffer,
i32,
with_capacity,
new,
benchmark_various,
16,
1024,
Expand All @@ -159,7 +162,7 @@ fn criterion_benchmark(c: &mut Criterion) {
c,
AllocRingBuffer,
i32,
with_capacity,
new,
benchmark_push_dequeue,
16,
1024,
Expand Down Expand Up @@ -195,4 +198,4 @@ fn criterion_benchmark(c: &mut Criterion) {
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
criterion_main!(benches, comparison_benches);
129 changes: 129 additions & 0 deletions benches/comparison.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#![cfg(not(tarpaulin))]

use criterion::{black_box, criterion_group, Bencher, Criterion};
use ringbuffer::{AllocRingBuffer, ConstGenericRingBuffer, RingBuffer};
use std::collections::{LinkedList, VecDeque};
use std::sync::mpsc::channel;

const ITER: usize = 1024 * 16;
const CAP: usize = 1024;

fn std_chan(b: &mut Bencher) {
let (tx, rx) = channel();

b.iter(|| {
for i in 0..ITER {
let _ = tx.send(i);
black_box(());
}

for i in 0..ITER {
let res = rx.recv();
let _ = black_box(res);
}
});
}

fn vec(b: &mut Bencher) {
let mut vd = Vec::with_capacity(CAP);

b.iter(|| {
for i in 0..ITER {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we push until CAP and then start doing remove & push, to have the number of elements stay on CAP

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove from front push on back specifically as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I guess that makes sure that the first iteration is not an outlier

let _ = vd.push(i);
black_box(());
}

for i in 0..ITER {
let res = vd.remove(0);
let _ = black_box(res);
}
});
}

fn vecdeque(b: &mut Bencher) {
let mut vd = VecDeque::with_capacity(CAP);

b.iter(|| {
for i in 0..ITER {
let _ = vd.push_back(i);
black_box(());
}
for i in 0..ITER {
let res = vd.pop_front();
let _ = black_box(res);
}
});
}

fn linked_list(b: &mut Bencher) {
let mut ll = LinkedList::new();

b.iter(|| {
for i in 0..ITER {
let _ = ll.push_back(i);
black_box(());
}

for i in 0..ITER {
let res = ll.pop_front();
let _ = black_box(res);
}
});
}

fn cg_rb(b: &mut Bencher) {
let mut rb = ConstGenericRingBuffer::<_, CAP>::new();

b.iter(|| {
for i in 0..ITER {
let _ = rb.push(i);
black_box(());
}
for i in 0..ITER {
Copy link
Owner

@NULLx76 NULLx76 Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this more than CAP amount of time is just a no-op right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of, the buffer has to reject the items all the rest of the times which might take time just as our dropping of old items does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry that's for the heapless version. Here it will drop old items ITER-CAP times

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just kinda confused why we are testing different things for different structures, shouldn't we test all under the use case of:

A buffer which can contain at max CAP items, removing old entries as new ones get pushed. Clearing out the buffer at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no other datastructure has that behavior. They're different datastructures, we cannot test them under the same circumstances

Copy link
Owner

@NULLx76 NULLx76 Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the comparison is meaningless imo, it should be about using similar datastructures for the same purpose. I think we can emulate ringbuffer behaviour on most if not all of these datastructures

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. No benchmark can completely fairly compare these datastructures, but it does give a good overview of differences in order of magnitude. Of course a channel is not a ringbuffer, but pushing 2^14 items to a channel takes about 14 times longer than to a ringbuffer which deals with removing old items. Simply pushing 2^14 items to a vecdeque that has to deal with growing is about 3x slower than pushing to a ringbuffer. Pushing to a heapless deque which drops items on insert is a few percent slower than pushing to a ringbuffer which deletes old items.

None of them are the same datastructure, but it does show the order of magnitude of differences you can expect when using these datastructures. Each datastructure is good at different things, but this is their normal performance when you use them as a queue.

In fact, I don't really want to modify these datastructures to function identically to ringbuffer. Then we don't measure how these datastructures normally perform as a queue for the purpose they were designed for. Instead we measure how well we can turn the datastructure into a ringbuffer.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we add a disclaimer/description of the benchmarks in the readme this is fine then I think

let res = rb.dequeue();
let _ = black_box(res);
}
});
}

fn heapless_deque(b: &mut Bencher) {
let mut rb = heapless::Deque::<_, CAP>::new();

b.iter(|| {
for i in 0..ITER {
let _ = rb.push_back(i);
black_box(());
}
for i in 0..ITER {
let res = rb.pop_front();
let _ = black_box(res);
}
});
}

fn al_rb(b: &mut Bencher) {
let mut rb = AllocRingBuffer::with_capacity_non_power_of_two(CAP);

b.iter(|| {
for i in 0..ITER {
let _ = rb.push(i);
black_box(());
}
for i in 0..ITER {
let res = rb.dequeue();
let _ = black_box(res);
}
});
}

fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("comparison std channel", std_chan);
c.bench_function("comparison std vec", vec);
c.bench_function("comparison std linked list", linked_list);
c.bench_function("comparison std vecdeque (growable ringbuffer)", vecdeque);
c.bench_function("comparison const generic ringbuffer", cg_rb);
c.bench_function("comparison alloc ringbuffer", al_rb);
c.bench_function("comparison heapless deque", heapless_deque);
}

criterion_group!(comparison_benches, criterion_benchmark);
1 change: 1 addition & 0 deletions src/with_alloc/alloc_ringbuffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ unsafe impl<T, SIZE: RingbufferSize> RingBuffer<T> for AllocRingBuffer<T, SIZE>

#[inline]
fn fill_with<F: FnMut() -> T>(&mut self, mut f: F) {
// This clear is necessary so that the drop methods are called.
self.clear();

self.readptr = 0;
Expand Down