fix: Reduce RowPartition memory allocation #244
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
RowPartition
allocates memory for storing row addresses and sizes. We callArrayBuffer.clear
inRowPartition.reset
method to clean up the memory allocation. ButArrayBuffer.clear
doesn't actually deallocate the internal array but just initiates its content.When we insert elements into the array, it will ensure the array size by doubling the array size every time if the space is not enough. So the array grows. That's said if the arrow grows,
clear
cannot deallocate the array after spilling but still uses unnecessary size of array.This patch fixes it.
Besides, as we limit columnar batch size in native writer, it doesn't make sense to have JVM row buffer larger than the size. In this patch, it uses
COMET_COLUMNAR_SHUFFLE_BATCH_SIZE
as the initial size ofRowPartition
.What changes are included in this PR?
How are these changes tested?