Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(query): improve hash join #12928

Merged
merged 37 commits into from
Oct 8, 2023

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented Sep 19, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

Improve the kernels(take.rs, take_chunks.rs, take_compact.rs, concat.rs, filter.rs) and reduce memory usage:

  1. Use low-level operations as much as possible.
  2. Before building a StringColumn, we first scan iter and calculate the space required by it. This can reduce the memory usage of StringColumn and avoid the resize and grow operations of Vec;
  3. When the output_schema of hash join includes StringColumn, we add string_items_buf to avoid allocating memory frequently in kernels.
  4. ctx.get_function_context() and ctx.get_settings() have a certain amount of overhead, so we should call them as little as possible when building pipelines.
  5. For concat, when merging two Datablocks, we need Vec push num_rows times before, now we only need one copy_nonoverlapping.

The ci-benchmark TPC-H standalone results:

  • Q9: 21.2s -> 19.9s
  • Q10: 12.3s -> 11.7s
  • Q18: 35.4s -> 30.9s
截屏2023-09-23 10 09 55

This change is Reviewable

@vercel
Copy link

vercel bot commented Sep 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Oct 8, 2023 4:00am

@Dousir9 Dousir9 added the ci-benchmark Benchmark: run all test label Sep 22, 2023
@Dousir9 Dousir9 removed the ci-benchmark Benchmark: run all test label Sep 22, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12928-4576518

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 added the ci-benchmark Benchmark: run all test label Sep 22, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12928-53dbf42

note: this image tag is only available for internal use,
please check the internal doc for more details.

@databendlabs databendlabs deleted a comment from github-actions bot Sep 22, 2023
@Dousir9 Dousir9 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 22, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12928-8c004ac

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 22, 2023
@Dousir9 Dousir9 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 27, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12928-24f1d81

note: this image tag is only available for internal use,
please check the internal doc for more details.

@github-actions
Copy link
Contributor

@Dousir9 Dousir9 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Oct 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Docker Image for PR

  • tag: pr-12928-611527a

note: this image tag is only available for internal use,
please check the internal doc for more details.

@databendlabs databendlabs deleted a comment from github-actions bot Oct 2, 2023
@BohuTANG BohuTANG merged commit 0111ccf into databendlabs:main Oct 8, 2023
58 checks passed
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* improve hash join

* improve concat

* improve take_string and add take_boolean

* fix

* improve concat

* improve concat_string_types

* improve take

* improve filter

* update

* remove get_function_context

* improve settings

* allow too_many_arguments

* merge

* merge

* refine primitive comments

* refine

* refine

* refine take_compact

* fix take_compact

* add safety comment

* fix take_compact_string

* refine: use extend from iter and get_unchecked_mut

* refine concat_primitive_types

* reduce pr size

* reduce pr size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test ci-cloud Build docker image for cloud test pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants