forked from scandum/crumsort
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add simdsort and vqsort to bench.cpp * Remove parts of bench.cpp that handle differences between C/C++ (since we now always compile as C++) * Add python script to plot the benchmarks * Add the plotted results (as PNG images) * Add writeup of the results * Check off the "update benchmarks" task in the README. I'm also removing "re-enable optimizations for primitive types" from the todo list because based on the benchmarks I'm not sure it's worth the effort.
- Loading branch information
Showing
23 changed files
with
271 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Benchmarks | ||
========== | ||
|
||
The results shown here were obtained from a release build compiled with ClangCL with optimization flags `/O3` and `/DNDEBUG`. | ||
|
||
## Results ## | ||
|
||
#### Random #### | ||
|
||
*The input array is composed of unsorted random integers.* | ||
|
||
<img alt="graph of benchmark results on unsorted random integers" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/random%2010000.png" width="480"> | ||
|
||
#### Random High Bits #### | ||
|
||
*The input array is composed of random unsorted integers, but with the randomness in the high (most significant) bits of the integer rather than the low bits.* | ||
|
||
<img alt="graph of benchmark results on unsorted random strings" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/random%20high%20bits%2010000.png" width="480"> | ||
|
||
#### Random Half #### | ||
|
||
*Half of the input array is already in sorted order; the other half is unsorted.* | ||
|
||
<img alt="graph of benchmark results on half unsorted, half sorted integers" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/random%20half%2010000.png" width="480"> | ||
|
||
#### Ascending #### | ||
|
||
*The input array is already in sorted order.* | ||
|
||
<img alt="graph of benchmark results on sorted, ascending integers" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/ascending%2010000.png" width="480"> | ||
|
||
#### Descending #### | ||
|
||
*The input array is already sorted in reverse order.* | ||
|
||
<img alt="graph of benchmark results on sorted, descending integers" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/descending%2010000.png" width="480"> | ||
|
||
#### Ascending Saw #### | ||
|
||
*The input array is composed of alternating runs of sorted and unsorted integers.* | ||
|
||
<img alt="graph of benchmark results on alternating runs of sorted and unsorted integers" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/ascending%2010000.png" width="480"> | ||
|
||
#### Ascending Tiles #### | ||
|
||
*The input array is divided into two chunks, each of which are internally sorted.* | ||
|
||
<img alt="graph of benchmark results on ascending tiles" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/ascending%20tiles%2010000.png" width="480"> | ||
|
||
#### Pipe Organ #### | ||
|
||
*The first half of the input array is sorted in ascending order and the second half is sorted in descending order.* | ||
|
||
<img alt="graph of benchmark results on a pipe organ array" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/pipe%20organ%2010000.png" width="480"> | ||
|
||
#### Bit Reversal #### | ||
|
||
*The input array is composed of sequential integers that have been.* | ||
|
||
<img alt="graph of benchmark results on a pipe organ array" src="https://github.com/psadda/crumsort-cpp/blob/main/bench/results/bit%20reversal%2010000.png" width="480"> | ||
|
||
## A Deeper Look at the Candidates ## | ||
|
||
### General Purpose Sorts: `crumsort`, `quadsort`, `pdqsort`, `timsort` ### | ||
|
||
These functions accept any comparison predicate (as long as that) and can sort any type of movable data. `pdsort` and the C++ version of `crumsort` can be used as drop in replacements for `std::sort`. `timsort` and the C++ version of `quadsort` can be used as drop in replacments for `std::stable_sort`. | ||
|
||
The C versions of `crumsort` and `quadsort` can be compiled with a `cmp` macro that replaces the general purpose predicate with a hardcoded greater-than or less-than predicate for numeric types, slightly improving performance. They were _**not**_ compiled with `cmp` in this benchmark. | ||
|
||
### Numeric-Only Sorts: `ska_sort`, `rhsort`, `x86-simd-sort`, `vqsort` ### | ||
|
||
These sorts incoporate a radix sort or counting sort. That makes them very fast for sorting numeric data, but it also comes with some limitations: | ||
|
||
* They are _**not**_ general purpose sorts: the key has to be numeric, and the sorting predicate has to be a simple numeric less-than or greater-than. | ||
|
||
`ska_sort` is slightly more flexible than the others — it also accepts arrays of numbers as the key. (This includes strings, but thanks to the limitation on the predicate you wouldn't be able to do say, a case-insensitive string sort with `ska_sort` without first transforming the input array.) | ||
|
||
* Their performance is sensitive to the bit-length of the key. They may be limited to keys of a particular size. | ||
* `x86-simd-sort` and `vqsort` are SIMD accelerated and will only work on CPUs that implement one of their supported instruction sets. | ||
* All of the above mean that these functions are _**not**_ drop in replacements for `std::sort`, although they could be used to optimize specializations of `std::sort`. | ||
|
||
| Algorithm | Time (Avg) | Space (Avg) | Stable | Key Type | Key Size | Portability | | ||
| --------------- | ---------- | ----------- | -------| --------------------------- | ------------ | ------------ | | ||
| `ska_sort` | O(n) | O(n) | no | numeric or array of numeric | any | any | | ||
| `rhsort` | O(n) | O(n) | yes | numeric only | 32 bit only | any | | ||
| `x86-simd-sort` | O(n) | O(1)/O(n)† | no | numeric only | 32 or 64 bit | AVX2, AVX512 | | ||
| `vqsort` | O(n) | O(n) | no | numeric only | 32 or 64 bit | AVX2, AVX512, NEON | | ||
|
||
† `x86-simd-sort` uses O(1) space when sorting numeric data. It uses O(n) space when sorting arbitrary data with a numeric key. | ||
|
||
This benchmark uses the AVX2 flavors of `x86-simd-sort` and `vqsort`. | ||
|
||
## Concluding Remarks ## | ||
|
||
* The C++ versions of `crumsort` and `quadsort` are competetive across the entire suite of tests — almost always best or second best. Performance is generally on par with that of the C versions. | ||
* `std::sort` and `std::stable_sort` (at least the Microsoft implementations tested in this benchmark) have generally okay performance. If sorting isn't a bottleneck, it's very reasonable to stick with the standard library sorts to avoid introducing a new dependency. | ||
* `qsort`, on the other hand is absolutely terrible. But this may be Microsoft specific, as C and the C standard library are very much second class citizens in MSVC space. | ||
* `pdqsort` is pretty good across the board. It's a step ahead of `timsort`, which struggles with some data patterns. It's also a much simpler algorithm than `crumsort`, so it's a good option for those who are looking to balance runtime performance with binary size and code complexity. | ||
* If you only need to sort numeric data, `rhsort` is very fast. `x86-simd-sort` and `vqsort` are sometimes faster if you can rely on compatible vector extensions being present. However, the more general purpose `crumsort` and `quadsort` still perform better in several benchmarks. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
from tqdm import tqdm | ||
import fileinput | ||
|
||
TESTS = { | ||
'random int' : 'random', | ||
'random order' : 'random high bits', | ||
'random half' : 'random half', | ||
'ascending order' : 'ascending', | ||
'descending order' : 'descending', | ||
'ascending saw' : 'ascending saw', | ||
'ascending tiles' : 'ascending tiles', | ||
'pipe organ' : 'pipe organ', | ||
'bit reversal' : 'bit reversal' | ||
} | ||
|
||
ALGORITHMS = { | ||
'crumsort' : 'crumsort (C)', | ||
'quadsort' : 'quadsort (C)', | ||
'cxcrumsort' : 'crumsort (C++)', | ||
'cxquadsort' : 'quadsort (C++)', | ||
'qsort' : 'qsort', | ||
'sort' : 'std::sort', | ||
'stablesort' : 'std::stable_sort', | ||
'pdqsort' : 'pdqsort', | ||
'timsort' : 'timsort', | ||
'skasort' : 'ska_sort', | ||
'rhsort' : 'rhsort', | ||
'simdsort' : 'x86-simd-sort (AVX2)', | ||
'vqsort' : 'vqsort' | ||
} | ||
|
||
EXPECTED_X_VALUES = [ 10, 100, 1000, 10000, 100000 ] | ||
|
||
benchmark_results = {} | ||
|
||
for line in fileinput.input(): | ||
|
||
line = line.strip() | ||
|
||
if len(line) > 0 and line[0] == '|' and line[-1] == '|': # This line actually has a table row | ||
|
||
# Parse the table row | ||
cells = line.split('|')[1:-1] # Throw out empty first and last elements | ||
cells = [c.strip() for c in cells] | ||
if cells[0] == 'Name' or len(cells[0]) == 0 or cells[0][0] == '-': # Throw out header rows | ||
continue | ||
|
||
# Extract the benchmark results | ||
algorithm = cells[0] | ||
array_len = int(cells[1]) | ||
time = float(cells[4]) | ||
test_name = cells[-1] | ||
|
||
if not algorithm in ALGORITHMS or not test_name in TESTS: | ||
continue | ||
algorithm = ALGORITHMS[algorithm] | ||
test_name = TESTS[test_name] | ||
|
||
# Store results in the global result table | ||
benchmark_results.setdefault(test_name, {}) | ||
benchmark_results[test_name].setdefault(algorithm, {}) | ||
benchmark_results[test_name][algorithm][array_len] = time | ||
|
||
# Bar graphs | ||
for test in tqdm(TESTS.values()): | ||
|
||
fig, ax = plt.subplots() | ||
|
||
labels = [] | ||
values = [] | ||
|
||
for algorithm in reversed(ALGORITHMS.values()): | ||
|
||
if not algorithm in benchmark_results[test]: | ||
continue | ||
|
||
labels.append(algorithm) | ||
values.append(benchmark_results[test][algorithm][10000]) | ||
|
||
y_pos = np.arange(len(labels)) | ||
plt.barh(y_pos, values, align='center') | ||
plt.yticks(y_pos, labels) | ||
plt.xlabel('run time (ns/value)') | ||
plt.title(test + ' (10,000 elements)') | ||
plt.tight_layout() | ||
|
||
fig.savefig(test + " 10000.png") | ||
plt.close() | ||
|
||
# Line/scaling graphs | ||
for test in tqdm(TESTS.values()): | ||
|
||
fig, ax = plt.subplots() | ||
ax.set_xscale('log') | ||
ax.set_yscale('log') | ||
|
||
ax.set(xlabel='array length', ylabel='run time (ns/value)', title=test) | ||
|
||
for algorithm in reversed(ALGORITHMS.values()): | ||
|
||
if not algorithm in benchmark_results[test]: | ||
continue | ||
|
||
times = benchmark_results[test][algorithm] | ||
array_lens = [key for key in times] | ||
array_lens.sort() | ||
datapoints = [times[array_len] for array_len in array_lens] | ||
datapoints = np.array(datapoints) | ||
|
||
ax.plot(array_lens, datapoints, label=algorithm) | ||
|
||
ax.legend() | ||
fig.savefig(test + ".png") | ||
plt.close() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.