Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): agg-hashtable-p2 #13548

Closed
wants to merge 42 commits into from

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Nov 2, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

Introduce agg hashtable in the query pipeline and support partition ht.

We can enable this via : set enable_experimental_aggregate_hashtable = 1;

Note this pr only works in singleton deployment currently.

Pros:

  1. better performance in high cardinality group aggregation, up to 2x
  2. better and cleaner codes than the old one

Cons:

  1. It may consume more memory usage than the old one cause it uses using two-part struct layout
  2. A little worse performance in normal group aggregation, ~ 0.9x
  • Closes #issue

This change is Reviewable

Copy link

vercel bot commented Nov 2, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Nov 3, 2023 2:16am

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 2, 2023
@sundy-li
Copy link
Member Author

sundy-li commented Nov 2, 2023

Performance tests:

Main:      PR:     Improvement Ratio: 
1.081      0.784    0.274746
3.487      2.390    0.314597
5.136      3.306    0.356308
0.550      0.614    -0.116364
1.412      0.791    0.439802
0.090      0.051    0.433333
0.120      0.106    0.116667

Benchmark script:

cat > queries.sql << EOF
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
SELECT UserID, extract(minute FROM EventTime) AS m, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, m, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) , AVG(ResolutionWidth) d FROM hits WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY d desc, WatchID desc LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
EOF

cat queries.sql | while read line; do
	res=`echo $line | bendsql --set enable_experimental_aggregate_hashtable=0 --time  --output null`
	
	## hot run
	echo $line | bendsql --set enable_experimental_aggregate_hashtable=0 --time  --output null
done

cat queries.sql | while read line; do
	res=`echo $line | bendsql --set enable_experimental_aggregate_hashtable=1 --time  --output null`
	echo $line | bendsql --set enable_experimental_aggregate_hashtable=1 --time  --output null
done

@sundy-li sundy-li requested a review from zhang2014 November 15, 2023 08:00
@sundy-li sundy-li marked this pull request as ready for review November 15, 2023 08:00
@sundy-li sundy-li requested a review from ariesdevil November 15, 2023 08:01
@sundy-li sundy-li added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Nov 16, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13548-01851d1

note: this image tag is only available for internal use,
please check the internal doc for more details.

@sundy-li sundy-li added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Nov 17, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13548-930c246

note: this image tag is only available for internal use,
please check the internal doc for more details.

@sundy-li sundy-li marked this pull request as draft November 17, 2023 02:13
@sundy-li sundy-li added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Nov 17, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13548-f50a566

note: this image tag is only available for internal use,
please check the internal doc for more details.

@sundy-li sundy-li added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Nov 18, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13548-c9fe095

note: this image tag is only available for internal use,
please check the internal doc for more details.

@sundy-li
Copy link
Member Author

hits benchmark:

image

Q31, Q32
image

Copy link
Contributor

Pull request description must contain CLA like the following:

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

## Summary

Summary about this PR

- Close #issue

@sundy-li sundy-li removed the ci-cloud Build docker image for cloud test label Jan 25, 2024
@sundy-li sundy-li mentioned this pull request Feb 1, 2024
11 tasks
@sundy-li sundy-li closed this Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants