Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Deallocate row addresses and size arrays after exporting #246

Merged
merged 3 commits into from
Apr 7, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 7, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

When we export row addresses and sizes to native, they are copied to separate arrays. We should deallocate array buffers after exporting them to reduce memory allocation.

What changes are included in this PR?

How are these changes tested?

@viirya
Copy link
Member Author

viirya commented Apr 7, 2024

cc @sunchao @parthchandra

Comment on lines +35 to +45
def getRowAddresses: Array[Long] = {
val array = rowAddresses.toArray
rowAddresses = null
array
}

def getRowSizes: Array[Int] = {
val array = rowSizes.toArray
rowSizes = null
array
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, these are writing operations, I don't think it's suitable to put them in the getters? Besides, the getNumRows may still access rowAddresses.

How about adding a new method?

def destructiveRowAddressAndSizes: (Array[Long], Array[Int])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They won't be accessed again after getRowAddresses and getRowSizes get called. Once they are called, they are exported to native side for spilling. After that, RowPartition will be reset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.,

long addrs[] = getRowAddresses();
long sizes[] = getRowSizes;
nativeSpilling(adds, sizes);
rowPartition.reset();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They won't be accessed again after getRowAddresses and getRowSizes get called

I see. Maybe I'm being nitpicking. I think caller would be surprise to see the state change of RowPartition after calling a getter method. A more appropriate method name or some comments would help.

viirya added 2 commits April 7, 2024 10:58
This reverts commit 4cf7a3d.
@viirya viirya merged commit 8a512ba into apache:main Apr 7, 2024
28 checks passed
@viirya
Copy link
Member Author

viirya commented Apr 7, 2024

Merged. Thanks.

@viirya viirya deleted the row_partitions2 branch April 7, 2024 20:23
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…#246)

When we export row addresses and sizes to native, they are copied to separate arrays. We should deallocate array buffers after exporting them to reduce memory allocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants