Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reduce RowPartition memory allocation #244

Merged
merged 1 commit into from
Apr 7, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 6, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

RowPartition allocates memory for storing row addresses and sizes. We call ArrayBuffer.clear in RowPartition.reset method to clean up the memory allocation. But ArrayBuffer.clear doesn't actually deallocate the internal array but just initiates its content.

When we insert elements into the array, it will ensure the array size by doubling the array size every time if the space is not enough. So the array grows. That's said if the arrow grows, clear cannot deallocate the array after spilling but still uses unnecessary size of array.

This patch fixes it.

Besides, as we limit columnar batch size in native writer, it doesn't make sense to have JVM row buffer larger than the size. In this patch, it uses COMET_COLUMNAR_SHUFFLE_BATCH_SIZE as the initial size of RowPartition.

What changes are included in this PR?

How are these changes tested?

@viirya
Copy link
Member Author

viirya commented Apr 6, 2024

cc @sunchao @parthchandra

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 88.88889% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 33.58%. Comparing base (d76c113) to head (63f5554).

Files Patch % Lines
.../comet/execution/shuffle/CometDiskBlockWriter.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #244      +/-   ##
============================================
- Coverage     33.58%   33.58%   -0.01%     
  Complexity      780      780              
============================================
  Files           107      107              
  Lines         37211    37212       +1     
  Branches       8160     8161       +1     
============================================
  Hits          12496    12496              
  Misses        22076    22076              
- Partials       2639     2640       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@viirya viirya merged commit a86626a into apache:main Apr 7, 2024
28 checks passed
@viirya
Copy link
Member Author

viirya commented Apr 7, 2024

Merged. Thanks.

@viirya viirya deleted the row_partitions branch April 7, 2024 00:07
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants