Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks from dashmap and hashbrown #43

Merged
merged 9 commits into from
Jan 30, 2020
Merged

Conversation

domenicquirl
Copy link
Collaborator

@domenicquirl domenicquirl commented Jan 29, 2020

This addresses #8. The added files have all the original benchmarks from dashmap and hashbrown to compare against (this includes several other hashmap implementations), as well as implementations of the dashmap- and hashbrown-specific tests for flurry. I also tried to add an implementation of the dashmap tests for flurry that doesn't pin() the epoch for every operation (the original implementation uses rayon's par_iter().for_each(), which requires the executed closure to be Send + Sync), however this requires setting up the threads differently and I'm not sure how much overhead this introduces. So have this in mind when using this comparison.

The main reason this is a draft is that I wanted to make sure about the licenses. MIT license, as far as I understand, requires license and copyright notices to be present in derived works. Am I correct in that this is required? If so, where exactly do we add them?

@jonhoo
Copy link
Owner

jonhoo commented Jan 29, 2020

For licenses, I think the right thing to do is to include the license file from the repositories you copied from into benches/ for example, with names like dashmap.LICENSE.

It would also be good to include a short comment at the top of each copied file saying where it was copied from.

Separately, I don't know that we need to pull in the benchmarks for other maps into here. benches/chashmap for example. I think the trick is going to be to just have benches/dashmap_flurry.rs, which is the dashmap benchmark implemented for flurry. When we eventually implement our own benchmark suits (like #30), then we will also implement that for other implementations, but I don't think there's an upside to copy-pasting existing benchmarks into here. Maybe just point users to them in the README and include instructions on how to run them there?

I would also like us to have separate benchmark files for the flurry version of different benchmarks. I'm imagining benches/hashbrown_like.rs and benches/dashmap_like.rs.

@domenicquirl
Copy link
Collaborator Author

Wouldn't we want to compare our results against the other hashmaps? Of course one can run their benchmarks separately, I just thought it would be nice to have them available without needing separate projects. For comparing performance across releases, one could still execute only the flurry benchmarks. Or do you want this comparison only for such tests as #30? Since the execution is exactly the same between original (dashmap/hashbrown) and flurry benchmark code, it felt natural to at least include these two as a reference.

Regarding the license, I will update the branch accordingly. I can also separate flurry.rs into two files 👍

@jonhoo
Copy link
Owner

jonhoo commented Jan 29, 2020

We do want to compare against other hashmaps, and while it is convenient to copy the benchmarks from there to here, I don't think it's worth it. Better to avoid keeping multiple copies, and instead just refer people to where the master copy of the benchmarks for the other backends live.

benches/flurry_hashbrown.rs Outdated Show resolved Hide resolved
benches/flurry_dashmap.rs Outdated Show resolved Hide resolved
benches/README.md Outdated Show resolved Hide resolved
@domenicquirl domenicquirl marked this pull request as ready for review January 30, 2020 16:56
@jonhoo
Copy link
Owner

jonhoo commented Jan 30, 2020

Have you tried actually running the benchmarks? I'm curious what kind of numbers you get and how they compare?

@domenicquirl
Copy link
Collaborator Author

domenicquirl commented Jan 30, 2020

They seem... slow? Inserting in particular is way slower than the other maps. Basically...

  • against hashbrown, insert and insert_erase run in the order of on ms, compared to 10-20 us in hashbrown. get holds up reasonably well though, still ~20 us to ~5-6, but given the added concurrency this seems much better than insert. Both of these use one guard across all operations of one benchmark run and relatively consistent across input distributions.
  • against dashmap, dashmap inserts in something between 1 ms with 1 thread and .8 ms for multiple threads, however what's probably more important is throughput. This for some reason is not in the report produced by criterion, but was on the order of 100 MElements/s. We compare with about 25-12 ms depending on thread count, which is so slow it takes several minutes to run any benchmark and puts throughput at something like 5, maybe 10 MElements/s. This is using 1 guard per map operation, which admittedly is not what you would do in a scenario such as the benchmark.
  • against dashmap, but with only one guard per thread (the different setup), we insert in 14-11 ms, which is better, but still considerably slower than dashmap. Note that this does not compare directly due to the different setup of the threads.
  • get against dashmap seems really good! dashmap gets 700-500 us, while we get 1.5 ms on one thread, 800us on 2 threads and go down to below 300 us on 8 threads, even with pin() for every call to get. With only one guard, this improves to 800 to a bit more than 200 us, depending on thread count.

So yeah, overall get is promising but insert is much slower against both references :/
Other maps were similar and generally beat us in throughput, with get being close and insert... not being close.

@domenicquirl
Copy link
Collaborator Author

domenicquirl commented Jan 30, 2020

It's worth noting that I ran this on my personal machine while doing other tasks, so this is not the cleanest setup. I don't think that this would be responsible for differences of this magnitude.

Is there a Rust profiling tool which could tell which part of insert takes so much time? That would be an interesting question.

@jonhoo
Copy link
Owner

jonhoo commented Jan 30, 2020

Oh, that is very interesting indeed! Let's land this and then do some profiling.

For profiling, my recommendation is to add:

[profile.release]
debug = true
[profile.bench]
debug = true

to Cargo.toml, start benchmark, and then use perf record -g --graph dwarf -p $(pgrep -af target/release) to record some profiling information. Then feed the results into a flamegraph and see what sticks out at you!

@jonhoo jonhoo merged commit 208cfcf into jonhoo:master Jan 30, 2020
@domenicquirl
Copy link
Collaborator Author

I might have to defer this to someone else. I'm on a Windows machine at the moment and will be a little busy the next few weeks anyways. Perhaps it's better to open an issue for this, so someone can really take a good look at this?

@jonhoo
Copy link
Owner

jonhoo commented Jan 30, 2020

Yes, that makes sense! Want to open an issue and include your preliminary results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants