Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Billion row challenge speedup #1584

Merged
merged 3 commits into from
May 31, 2024
Merged

Billion row challenge speedup #1584

merged 3 commits into from
May 31, 2024

Conversation

alexowens90
Copy link
Collaborator

@alexowens90 alexowens90 commented May 28, 2024

After establishing that deallocating segments was a bottleneck when scaling the billion row challenge out to many cores, we've decided to move to using mimalloc everywhere.
Using LD_PRELOAD with mimalloc, these optimisations further speed up running the billion row challenge (run on a 64 core machine with hyperthreading):

Cores master brc-speedup
1     76.47  61.39
2     40.10  33.79
4     18.70  16.58
8     10.11   8.68
16     6.83   6.44
32     4.78   5.17
64     5.41   5.15

This shows that scaling is good out to 8 cores, and drops off after that. Logging timings shows an obvious bottleneck in gather_entities within AggregationClause::process, which will be addressed in a future ticket to avoid conflicts with #1495.

auto input_data = (*column)->data();
auto cend = input_data.cend<typename type_info::TDT>();
for (auto input_it = input_data.cbegin<typename type_info::TDT>(); input_it != cend; ++input_it, ++row_to_segment_it) {
if (ARCTICDB_LIKELY(*row_to_segment_it != 255)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numeric_limits?

@alexowens90 alexowens90 merged commit 5af6003 into master May 31, 2024
114 checks passed
@alexowens90 alexowens90 deleted the brc-speedup branch May 31, 2024 15:48
grusev pushed a commit that referenced this pull request Nov 25, 2024
After establishing that deallocating segments was a bottleneck when
scaling the billion row challenge out to many cores, we've decided to
move to using [mimalloc
everywhere](#1577).
Using `LD_PRELOAD` with mimalloc, these optimisations further speed up
running the billion row challenge (run on a 64 core machine with
hyperthreading):
```
Cores master brc-speedup
1     76.47  61.39
2     40.10  33.79
4     18.70  16.58
8     10.11   8.68
16     6.83   6.44
32     4.78   5.17
64     5.41   5.15
```
This shows that scaling is good out to 8 cores, and drops off after
that. Logging timings shows an obvious bottleneck in `gather_entities`
within `AggregationClause::process`, which will be addressed in a
[future ticket](#1586) to
avoid conflicts with #1495.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants