-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Improve shuffle metrics (second attempt) #1175
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1175 +/- ##
============================================
- Coverage 34.32% 34.32% -0.01%
Complexity 899 899
============================================
Files 115 115
Lines 43500 43506 +6
Branches 9496 9498 +2
============================================
+ Hits 14931 14932 +1
- Misses 25659 25661 +2
- Partials 2910 2913 +3 ☔ View full report in Codecov by Sentry. |
Just for clarification - what's the relation between shuffle write time, encoding and compression total time, and native shuffle total time? |
There is also evaluating the partition expressions (typically very fast if they are just column references) and then the time to actually split the batches into partitions. |
From the above screenshot, Nonetheless, the PR certainly improves on the current. |
There is also interaction with the memory pool, which makes JNI calls into synchronized code in the JVM. I will see if I can make the metrics more complete in this PR. |
@parthchandra The numbers almost add up now. |
Brilliant! |
Which issue does this PR close?
N/A
This PR replaces #1173
Rationale for this change
Changes:
Before
After
What changes are included in this PR?
How are these changes tested?