Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW: Use weakref callbacks to track dead references #55539

Closed
wants to merge 1 commit into from

Conversation

wangwillian0
Copy link
Contributor

@wangwillian0 wangwillian0 commented Oct 16, 2023

This is an alternative solution to #55518. What it happening here:

  • Uses the weakref.ref callback to count the number of dead references present at the referenced_blocks list.
  • Takes 20% more time at the constructor because of the function declaration!
    • The function declaration can be moved to the add_reference and add_index_reference, but this is just moving the performance hit to these methods.
  • Everything is O(1) now, including has_reference. (_clear_dead_references is amortized to O(1))
  • If the number of dead references is more than 256 and +50% of the array, it will prune these references
    • The implication is that the linear slow dead reference cleaning happens next to the GC calls, which I think this is a more intuitive behavior.

(also check another possibilities at #55008)

Let me know if I should continue working on this solution.

def _weakref_cb(item: weakref.ref, selfref: weakref.ref = weakref.ref(self)) -> None:
self = selfref()
if self is not None:
self.dead_counter += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever reachable without the GIL? This doesn't appear thread safe so thinking through what impacts that might have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can you explain a bit more? Is pandas preparing to be thread-safe and use the no-GIL feature of the coming python version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We release the GIL in quite a few places in the Cython code today, mostly for downstream libraries like Dask to take advantage of. Though I guess this in its current state wouldn't work without the GIL since it is already using weakref on main

@WillAyd
Copy link
Member

WillAyd commented Oct 16, 2023

Assuming this is an alternate solution to @phofl PR in #55518

@phofl
Copy link
Member

phofl commented Oct 16, 2023

#55518 needs to be back ported. This is something that we can consider for main and 2.2, but I want to test this more before we rely on it.

@wangwillian0 wangwillian0 force-pushed the weakref-callback branch 4 times, most recently from 6eb63ce to c0f401d Compare November 5, 2023 19:24
@wangwillian0
Copy link
Contributor Author

I just fixed the tests. Any thoughts about it @phofl @WillAyd?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good though will defer to @phofl who is more familiar with this. I think he is traveling this week so thanks for your patience in advance


def __cinit__(self, blk: Block | None = None) -> None:
def _weakref_cb(item: weakref.ref, selfref: weakref.ref = weakref.ref(self)) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of all of the impacts but I think cinit is mostly used for C-level structures. I think setting this callback should instead be done in init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's doesn't work because cinit is called before init, so we need the callback ready there, unless we move everything to init. I changed it back 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. And this definitely doesn't cause any memory leaks or issues with reference counting then right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be correct as long as the reference list is not modified manually without the classes methods or run under concurrency.

Another thing is that taking the length of the list of refs directly won't give the correct length too (the actual value is len(referenced_blocks) - dead_counter). From what I looked, the exact length was never used anywhere else in the code right now, but I can implement the len method if needed.

If someone modified the list manually without the classes methods, the counter would be wrong and the memory consumption could grow a lot (just like the current solution at main) and the has_reference() will be plain wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant multi threading

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understood this, but again, this is a realistic scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem would be that self.dead_counter += 1 isn't atomic. I can add a lock to it but, correct if I'm wrong, the rest of the code isn't thread-safe too. e.g. list comprehension from _clear_dead_references doesn't seem to be thread-safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample of how _clear_dead_references is not really thread safe:

import threading
class Test:
    def __init__(self):
        self.v = list(range(1000))

    def add(self, x):
        self.v.append(x)

    def rebuild(self):
        self.v = [x for x in self.v]
    
    def simulate(self):
        for _ in range(1000):
            self.add(1)
            self.rebuild()
    
    def race(self):
        threads = [threading.Thread(target=self.simulate) for _ in range(10)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        print(len(self.v))

test1 = Test()
test2 = Test()
test1.race()
test2.race()

I couldn't reproduce the same with pandas code, but I think it's more about timing than something else. The point is, is this class supposed to be thread-safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope that works, was just not sure if I was missing something obvious

@wangwillian0 wangwillian0 force-pushed the weakref-callback branch 2 times, most recently from 583b908 to c1b9b83 Compare November 17, 2023 22:16
@wangwillian0
Copy link
Contributor Author

Any news?

@mroeschke
Copy link
Member

Personally I would be more partial to exploring #55631 (IMO I think the mental model of using a set would be simpler)

@phofl
Copy link
Member

phofl commented Dec 1, 2023

Can you run the ctors.py benchmarks?

@wangwillian0
Copy link
Contributor Author

Sorry for the delay. Here are the benchmarks:

Change Before [593fa85] After [98addad] Ratio Benchmark (Parameter)
+ 5.74±0.2ms 7.37±0.2ms 1.29 frame_ctor.FromArrays.time_frame_from_arrays_int
+ 7.65±0.08ms 8.63±0.2ms 1.13 frame_ctor.FromArrays.time_frame_from_arrays_sparse
+ 1.43±0.07ms 1.47±0.06ms 1.03 frame_ctor.FromRecords.time_frame_from_records_generator(1000)

@phofl
Copy link
Member

phofl commented Dec 5, 2023

How does this correspond to the other timings in the issue?

@wangwillian0
Copy link
Contributor Author

How does this correspond to the other timings in the issue?

Not sure if I fully understand what is being asked, but this ~20% increase is from the overhead in the constructor. What this PR will be better is mainly at has_reference, which will be O(1) independently of the number of references. Insertions also should be a little faster because there is no periodic calls to the cleanings method, but this is probably not meaningful in any way,

@wangwillian0
Copy link
Contributor Author

Updates?

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jan 13, 2024
@wangwillian0
Copy link
Contributor Author

Hey @phofl, what is the current status of this PR?

@wangwillian0
Copy link
Contributor Author

@phofl?

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Feb 8, 2024
@wangwillian0
Copy link
Contributor Author

Hi, @phofl, can you give an update?

@mroeschke
Copy link
Member

Thanks for the PR but it does seem there much bandwidth or interest from the core team to pursue this approach so closing this out for now. Can reopen if there's renewed interest

@mroeschke mroeschke closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: hash_pandas_object performance regression between 2.1.0 and 2.1.1
5 participants