Billion row challenge speedup #1584

alexowens90 · 2024-05-28T12:33:07Z

After establishing that deallocating segments was a bottleneck when scaling the billion row challenge out to many cores, we've decided to move to using mimalloc everywhere.
Using LD_PRELOAD with mimalloc, these optimisations further speed up running the billion row challenge (run on a 64 core machine with hyperthreading):

Cores master brc-speedup
1     76.47  61.39
2     40.10  33.79
4     18.70  16.58
8     10.11   8.68
16     6.83   6.44
32     4.78   5.17
64     5.41   5.15

This shows that scaling is good out to 8 cores, and drops off after that. Logging timings shows an obvious bottleneck in gather_entities within AggregationClause::process, which will be addressed in a future ticket to avoid conflicts with #1495.

willdealtry · 2024-05-29T17:49:02Z

cpp/arcticdb/column_store/memory_segment_impl.cpp

+            auto input_data = (*column)->data();
+            auto cend = input_data.cend<typename type_info::TDT>();
+            for (auto input_it = input_data.cbegin<typename type_info::TDT>(); input_it != cend; ++input_it, ++row_to_segment_it) {
+                if (ARCTICDB_LIKELY(*row_to_segment_it != 255)) {


numeric_limits?

After establishing that deallocating segments was a bottleneck when scaling the billion row challenge out to many cores, we've decided to move to using [mimalloc everywhere](#1577). Using `LD_PRELOAD` with mimalloc, these optimisations further speed up running the billion row challenge (run on a 64 core machine with hyperthreading): ``` Cores master brc-speedup 1 76.47 61.39 2 40.10 33.79 4 18.70 16.58 8 10.11 8.68 16 6.83 6.44 32 4.78 5.17 64 5.41 5.15 ``` This shows that scaling is good out to 8 cores, and drops off after that. Logging timings shows an obvious bottleneck in `gather_entities` within `AggregationClause::process`, which will be addressed in a [future ticket](#1586) to avoid conflicts with #1495.

alexowens90 requested review from willdealtry and poodlewars as code owners May 28, 2024 12:33

alexowens90 mentioned this pull request May 29, 2024

Further optimise billion row challenge performance on high core counts #1586

Closed

willdealtry approved these changes May 30, 2024

View reviewed changes

alexowens90 added 2 commits May 31, 2024 08:46

Speed up groupbys and aggregations

be5b797

Fix C++ tests post-rebase

17982f5

alexowens90 force-pushed the brc-speedup branch from 2409555 to 17982f5 Compare May 31, 2024 11:30

Replace 255 with std::numeric_limits<uint8_t>::max()

8fa4649

alexowens90 merged commit 5af6003 into master May 31, 2024
114 checks passed

alexowens90 deleted the brc-speedup branch May 31, 2024 15:48

alexowens90 mentioned this pull request Jun 10, 2024

Reinstate C++20 features #1608

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Billion row challenge speedup #1584

Billion row challenge speedup #1584

alexowens90 commented May 28, 2024 •

edited

Loading

willdealtry May 29, 2024

Billion row challenge speedup #1584

Billion row challenge speedup #1584

Conversation

alexowens90 commented May 28, 2024 • edited Loading

willdealtry May 29, 2024

Choose a reason for hiding this comment

alexowens90 commented May 28, 2024 •

edited

Loading