-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoW: Use exponential backoff when clearing dead references #55518
Changes from 2 commits
79c663f
a2f81a4
4423d19
d4c159b
da639c8
4d8c8fb
dd202a6
4664c52
4227598
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -890,17 +890,25 @@ cdef class BlockValuesRefs: | |
""" | ||
cdef: | ||
public list referenced_blocks | ||
public int clear_counter | ||
|
||
def __cinit__(self, blk: Block | None = None) -> None: | ||
if blk is not None: | ||
self.referenced_blocks = [weakref.ref(blk)] | ||
else: | ||
self.referenced_blocks = [] | ||
self.clear_counter = 500 # set reasonably high | ||
|
||
def _clear_dead_references(self) -> None: | ||
self.referenced_blocks = [ | ||
ref for ref in self.referenced_blocks if ref() is not None | ||
] | ||
def _clear_dead_references(self, force=False) -> None: | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if force or len(self.referenced_blocks) > self.clear_counter: | ||
self.referenced_blocks = [ | ||
ref for ref in self.referenced_blocks if ref() is not None | ||
] | ||
nr_of_refs = len(self.referenced_blocks) | ||
if nr_of_refs < self.clear_counter // 2: | ||
self.clear_counter = self.clear_counter // 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we think it is needed to also reduce this? Or, I assume this is mostly to reduce the counter again in case it has become very large, not necessarily to let it become smaller than 500 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very good point, I intended to add a max here, e.g. |
||
elif nr_of_refs > self.clear_counter: | ||
self.clear_counter = min(self.clear_counter * 2, nr_of_refs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could also just do x2 instead of the I am wondering that if you repeatedly add a reference (for an object that doesn't go out of scope), doesn't that end up increasing the counter only with +1 every time? For example, you have 501 refs, hitting the threshold, at that moment you clear the refs, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah you are correct, already added a max here because I came to the same conclusion |
||
|
||
def add_reference(self, blk: Block) -> None: | ||
"""Adds a new reference to our reference collection. | ||
|
@@ -934,6 +942,6 @@ cdef class BlockValuesRefs: | |
------- | ||
bool | ||
""" | ||
self._clear_dead_references() | ||
self._clear_dead_references(force=True) | ||
# Checking for more references than block pointing to itself | ||
return len(self.referenced_blocks) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regression occurs without Copy-on-Write too. I think we should mention that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I struggled a bit with the wording, any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
?
I think it's better to leave the copy-on-write part out - I personally couldn't find a way to word it without making it seem like the issue was with Copy-on-Write on only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am not really happy with listing methods, since this affects all kinds of things with wide data frames