Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve CometBroadcastHashJoin statistics #339

Merged

Conversation

planga82
Copy link
Contributor

Which issue does this PR close?

Closes #338 .

Rationale for this change

Add all statistics HashJoinExec datafusion node provides.

What changes are included in this PR?

All available metrics

/// Total time for collecting build-side of join
pub(crate) build_time: metrics::Time
/// Number of batches consumed by build-side
pub(crate) build_input_batches: metrics::Count,
/// Number of rows consumed by build-side
pub(crate) build_input_rows: metrics::Count,
/// Memory used by build-side in bytes
pub(crate) build_mem_used: metrics::Gauge,
/// Total time for joining probe-side batches to the build-side batches
pub(crate) join_time: metrics::Time,
/// Number of batches consumed by probe-side of this operator
pub(crate) input_batches: metrics::Count,
/// Number of rows consumed by probe-side this operator
pub(crate) input_rows: metrics::Count,
/// Number of batches produced by this operator
pub(crate) output_batches: metrics::Count,
/// Number of rows produced by this operator
pub(crate) output_rows: metrics::Count

image

How are these changes tested?

Unit testing and manual testing

(cherry picked from commit 97a647a0757250f9feaea6571b8cb0738c6ec340)
(cherry picked from commit df418aeaf9f0923d17a69edf5829c8f77a1934c1)
@planga82
Copy link
Contributor Author

It seems that there are problems in tests with Spark 3.3 and Spark 3.2. I'm checking it out.

@planga82
Copy link
Contributor Author

Fix tested in my repository with github actions

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @planga82

@viirya viirya merged commit e49a796 into apache:main Apr 30, 2024
28 checks passed
@viirya
Copy link
Member

viirya commented Apr 30, 2024

Merged. Thanks @planga82 @kazuyukitanimura

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* broadcast hash join metrics

(cherry picked from commit 97a647a0757250f9feaea6571b8cb0738c6ec340)

* broadcast hash join test

(cherry picked from commit df418aeaf9f0923d17a69edf5829c8f77a1934c1)

* format

* add assume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve CometBroadcastHashJoin statistics
3 participants