CoW: Use exponential backoff when clearing dead references #55518

phofl · 2023-10-14T15:14:28Z

closes PERF: hash_pandas_object performance regression between 2.1.0 and 2.1.1 #55245 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I'd rather keep @rhshadrach 's issue open, we might be able to find a better fix there

The runtime is back to what he had initially on this branch, e.g. 3ms on Joris examples and 1s on Richards issue

jorisvandenbossche · 2023-10-14T15:20:59Z

pandas/_libs/internals.pyx

+            if nr_of_refs < self.clear_counter // 2:
+                self.clear_counter = self.clear_counter // 2


Do we think it is needed to also reduce this? Or, I assume this is mostly to reduce the counter again in case it has become very large, not necessarily to let it become smaller than 500

Very good point, I intended to add a max here, e.g. max(..., 500)

jorisvandenbossche · 2023-10-14T15:24:01Z

pandas/_libs/internals.pyx

+            if nr_of_refs < self.clear_counter // 2:
+                self.clear_counter = self.clear_counter // 2
+            elif nr_of_refs > self.clear_counter:
+                self.clear_counter = min(self.clear_counter * 2, nr_of_refs)


Could also just do x2 instead of the min(..) calculation?

I am wondering that if you repeatedly add a reference (for an object that doesn't go out of scope), doesn't that end up increasing the counter only with +1 every time? For example, you have 501 refs, hitting the threshold, at that moment you clear the refs, but nr_of_refs is still 501 after doing that, and then here we set the new threshold to min(500 * 2, 501), i.e. 501?
I must be missing something, because otherwise I don't understand how this fixes the perf issue of the list(df.items()) example.

Yeah you are correct, already added a max here because I came to the same conclusion

pandas/_libs/internals.pyx

lithomas1 · 2023-10-14T17:19:27Z

doc/source/whatsnew/v2.1.2.rst

@@ -17,6 +17,7 @@ Fixed regressions
 - Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
 - Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
 - Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
+- Fixed performance regression in Copy-on-Write mechanism (:issue:`55256`, :issue:`55245`)


The regression occurs without Copy-on-Write too. I think we should mention that here.

Yeah I struggled a bit with the wording, any suggestions?

Maybe

Fixed performance regression in DataFrame copying, DataFrame iteration, and groupby methods taking user defined functions.

?

I think it's better to leave the copy-on-write part out - I personally couldn't find a way to word it without making it seem like the issue was with Copy-on-Write on only.

Yeah I am not really happy with listing methods, since this affects all kinds of things with wide data frames

lithomas1 · 2023-10-14T17:21:11Z

Can you add a test that just adds blocks to a BlockValuesRef to make sure the exponential backoff is working as expected?

phofl · 2023-10-14T17:41:08Z

Tests are a good idea, added the relevant cases

doc/source/whatsnew/v2.1.2.rst

Co-authored-by: Joris Van den Bossche <[email protected]>

mroeschke · 2023-10-15T01:29:56Z

pandas/_libs/internals.pyx

+        # Use exponential backoff to decide when we want to clear references
+        # if force=False. Clearing for every insertion causes slowdowns if
+        # all these objects stay alive, e.g. df.items() for wide DataFrames
+        # see GH#55245 and GH#55008


Do we have a reference issue to eventually change this to a WeakSet-like implementation? IIRC I saw that discussed somewhere

I'll open an issue about that as a follow up when this is in

rhshadrach · 2023-10-15T09:47:04Z

Edit: Lots of false positives on the first run, reran individual ASVs. No perf regressions.

ASVs for d4c159b

| Change   | Before [0021d241] <cow_dead_regf~4>   | After [d4c159b5] <cow_dead_regf>   |   Ratio | Benchmark (Parameter)                                                                                                                                              |
|----------|---------------------------------------|------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -        | 541±20ns                              | 491±20ns                           |    0.91 | index_cached_properties.IndexCache.time_values('UInt64Index')                                                                                                      |
| -        | 612±20ns                              | 551±30ns                           |    0.9  | index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')                                                                                             |
| -        | 237±20ns                              | 214±3ns                            |    0.9  | indexing_engines.NumericEngineIndexing.time_get_loc_near_middle((<class 'pandas._libs.index.Int64Engine'>, <class 'numpy.int64'>), 'non_monotonic', True, 100000)  |
| -        | 59.3±1ms                              | 53.1±1ms                           |    0.9  | io.json.ToJSONWide.time_to_json('values', 'df_int_float_str')                                                                                                      |
| -        | 1.34±0.1ms                            | 1.21±0.01ms                        |    0.9  | io.parsers.ConcatDateCols.time_check_concat(1234567890, 2)                                                                                                         |
| -        | 26.6±0.7ms                            | 23.9±0.8ms                         |    0.9  | io.stata.Stata.time_read_stata('ty')                                                                                                                               |
| -        | 47.1±2ms                              | 42.6±0.3ms                         |    0.9  | strings.Methods.time_findall('string[python]')                                                                                                                     |
| -        | 9.52±1μs                              | 8.60±0.06μs                        |    0.9  | timeseries.TzLocalize.time_infer_dst(tzutc())                                                                                                                      |
| -        | 859±20ns                              | 777±30ns                           |    0.9  | tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('timestamp', 0, tzlocal())                                                                               |
| -        | 252±2μs                               | 224±20μs                           |    0.89 | algos.isin.IsIn.time_isin_categorical('string[python]')                                                                                                            |
| -        | 2.27±0.03ms                           | 2.03±0.02ms                        |    0.89 | frame_methods.Apply.time_apply_pass_thru                                                                                                                           |
| -        | 23.9±2μs                              | 21.4±0.4μs                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'count', 'direct', 1, 'cython')                                                                                 |
| -        | 242±20ns                              | 215±4ns                            |    0.89 | indexing_engines.NumericEngineIndexing.time_get_loc_near_middle((<class 'pandas._libs.index.Int64Engine'>, <class 'numpy.int64'>), 'monotonic_decr', True, 100000) |
| -        | 128±2μs                               | 111±2μs                            |    0.87 | algos.isin.IsIn.time_isin_categorical('uint64')                                                                                                                    |
| -        | 111±2ms                               | 96.2±2ms                           |    0.87 | groupby.Transform.time_transform_lambda_max                                                                                                                        |
| -        | 321±20ns                              | 280±20ns                           |    0.87 | index_cached_properties.IndexCache.time_inferred_type('Int64Index')                                                                                                |
| -        | 671±80ns                              | 587±30ns                           |    0.87 | index_cached_properties.IndexCache.time_is_monotonic_decreasing('Int64Index')                                                                                      |
| -        | 613±60μs                              | 526±4μs                            |    0.86 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function mul>)                                                         |
| -        | 13.2±0.1ms                            | 11.4±0.07ms                        |    0.86 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'unique_monotonic_inc')                                                                   |
| -        | 21.0±0.4ms                            | 17.9±0.6ms                         |    0.85 | frame_methods.Fillna.time_bfill(True, 'object')                                                                                                                    |
| -        | 13.7±0.1ms                            | 11.5±0.09ms                        |    0.84 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'nonunique_monotonic_inc')                                                                |
| -        | 1.98±0.01ms                           | 1.64±0.03ms                        |    0.83 | join_merge.Concat.time_concat_series(0)                                                                                                                            |
| -        | 24.1±0.2ms                            | 19.8±0.1ms                         |    0.82 | tslibs.normalize.Normalize.time_is_date_array_normalized(1000000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                |
| -        | 44.9±0.3ms                            | 36.1±0.4ms                         |    0.8  | frame_methods.Duplicated.time_frame_duplicated_wide                                                                                                                |
| -        | 3.17±0.04μs                           | 2.54±0.01μs                        |    0.8  | tslibs.normalize.Normalize.time_is_date_array_normalized(100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                    |
| -        | 250±0.4μs                             | 201±3μs                            |    0.8  | tslibs.normalize.Normalize.time_is_date_array_normalized(10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                  |
| -        | 6.24±2ms                              | 4.67±0.02ms                        |    0.75 | sparse.GetItemMask.time_mask(nan)                                                                                                                                  |
| -        | 6.39±0.03ms                           | 4.75±0.03ms                        |    0.74 | frame_methods.AsType.time_astype(('float64', 'Float64'), False)                                                                                                    |
| -        | 6.72±0.1ms                            | 4.94±0.01ms                        |    0.74 | frame_methods.AsType.time_astype(('float64', 'float64[pyarrow]'), False)                                                                                           |
| -        | 10.0±0.3μs                            | 7.48±0.08μs                        |    0.74 | timeseries.TzLocalize.time_infer_dst(None)                                                                                                                         |
| -        | 24.9±1ms                              | 18.1±0.5ms                         |    0.73 | reindex.Reindex.time_reindex_multiindex_with_cache                                                                                                                 |
| -        | 2.43±0.01ms                           | 1.72±0.02ms                        |    0.71 | reshape.ReshapeExtensionDtype.time_stack('datetime64[ns, US/Pacific]')                                                                                             |
| -        | 2.25±0.02ms                           | 1.58±0.01ms                        |    0.7  | reshape.ReshapeExtensionDtype.time_stack('Period[s]')                                                                                                              |
| -        | 622±10μs                              | 405±2μs                            |    0.65 | frame_methods.Iteration.time_items_cached                                                                                                                          |
| -        | 16.7±0.2ms                            | 7.19±0.05ms                        |    0.43 | frame_methods.Iteration.time_items                                                                                                                                 |
| -        | 187±0.5ms                             | 74.0±1ms                           |    0.39 | groupby.Apply.time_copy_overhead_single_col(4)                                                                                                                     |
| -        | 995±6ms                               | 350±3ms                            |    0.35 | groupby.Apply.time_copy_function_multi_col(4)                                                                                                                      |
| -        | 965±5ms                               | 107±0.8ms                          |    0.11 | frame_methods.GetDtypeCounts.time_info                                                                                                                             |
| -        | 265±1ms                               | 25.7±0.03ms                        |    0.1  | frame_methods.Iteration.time_iteritems_indexing

From #55256 I'm now getting 1.34s with no CoW, 1.63s with.

phofl · 2023-10-15T10:53:06Z

Thx @rhshadrach appreciate you running the asvs.

So we are good here? I want to look into the groupby problem independently to see if there is something we can do

pandas/tests/copy_view/test_internals.py

wangwillian0 · 2023-10-16T14:07:12Z

pandas/_libs/internals.pyx

+            ]
+            nr_of_refs = len(self.referenced_blocks)
+            if nr_of_refs < self.clear_counter // 2:
+                self.clear_counter = max(self.clear_counter // 2, 500)


I would suggest a shrink factor of 4 or more. If it's the same as the growth factor it can create a few corner cases that will still have O(n^2). e.g. length going back and forth between (500*2^n)-1 and (500*2^n)+1

Couldn't this happen as well for a shrink factor of 4? And this would only happen If we have this interleaved with inlace modifications, e.g if force=True, correct? Merging for now, but happy to follow up

Merging for now, but happy to follow up

+1

For a factor of 4 you would need to change the length to the extremes of the range [500*2^(n-1); 500*2^n], which is at least 500 (and more for a larger n), this is much better than triggering the slow operation on just adding and removing 3 references.

mroeschke

Looks good enough to address the regression for now

rhshadrach

lgtm

rhshadrach · 2023-10-22T13:41:23Z

Thanks @phofl

…ing dead references

… clearing dead references) (#55625) Backport PR #55518: CoW: Use exponential backoff when clearing dead references Co-authored-by: Patrick Hoefler <[email protected]>

phofl added 2 commits October 14, 2023 17:11

CoW: Use exponential backoff when clearing dead references

79c663f

Adjust slightly

a2f81a4

phofl requested a review from jorisvandenbossche October 14, 2023 15:14

phofl requested a review from WillAyd as a code owner October 14, 2023 15:14

jorisvandenbossche reviewed Oct 14, 2023

View reviewed changes

phofl added 2 commits October 14, 2023 17:22

Add max

4423d19

Add max

d4c159b

jorisvandenbossche reviewed Oct 14, 2023

View reviewed changes

jreback reviewed Oct 14, 2023

View reviewed changes

pandas/_libs/internals.pyx Show resolved Hide resolved

Add comment

da639c8

lithomas1 reviewed Oct 14, 2023

View reviewed changes

lithomas1 added this to the 2.1.2 milestone Oct 14, 2023

lithomas1 added the Performance Memory or execution speed performance label Oct 14, 2023

Add test for exponential backoff

4d8c8fb

jorisvandenbossche mentioned this pull request Oct 14, 2023

PERF: hash_pandas_object performance regression between 2.1.0 and 2.1.1 #55245

Closed

3 tasks

jorisvandenbossche reviewed Oct 14, 2023

View reviewed changes

doc/source/whatsnew/v2.1.2.rst Outdated Show resolved Hide resolved

Update doc/source/whatsnew/v2.1.2.rst

dd202a6

Co-authored-by: Joris Van den Bossche <[email protected]>

mroeschke reviewed Oct 15, 2023

View reviewed changes

rhshadrach requested changes Oct 15, 2023

View reviewed changes

pandas/tests/copy_view/test_internals.py Show resolved Hide resolved

wangwillian0 mentioned this pull request Oct 16, 2023

CoW: Use weakref callbacks to track dead references #55539

Closed

5 tasks

wangwillian0 reviewed Oct 16, 2023

View reviewed changes

mroeschke approved these changes Oct 16, 2023

View reviewed changes

rhshadrach mentioned this pull request Oct 19, 2023

PERF: Using Series or DataFrame in grouped iterator leads to significant increase in run time #55541

Closed

3 tasks

phofl added 2 commits October 21, 2023 20:20

Update test_internals.py

4664c52

Merge branch 'main' into cow_dead_regf

4227598

rhshadrach approved these changes Oct 22, 2023

View reviewed changes

rhshadrach added the Copy / view semantics label Oct 22, 2023

rhshadrach merged commit 3a90faa into pandas-dev:main Oct 22, 2023
37 checks passed

rhshadrach added the Regression Functionality that used to work in a prior pandas version label Oct 22, 2023

meeseeksmachine mentioned this pull request Oct 22, 2023

Backport PR #55518 on branch 2.1.x (CoW: Use exponential backoff when clearing dead references) #55625

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 22, 2023

Backport PR pandas-dev#55518: CoW: Use exponential backoff when clear…

3362409

…ing dead references

phofl deleted the cow_dead_regf branch October 22, 2023 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoW: Use exponential backoff when clearing dead references #55518

CoW: Use exponential backoff when clearing dead references #55518

phofl commented Oct 14, 2023

jorisvandenbossche Oct 14, 2023

phofl Oct 14, 2023

jorisvandenbossche Oct 14, 2023 •

edited

Loading

phofl Oct 14, 2023

lithomas1 Oct 14, 2023

phofl Oct 14, 2023

lithomas1 Oct 14, 2023 •

edited

Loading

phofl Oct 14, 2023

lithomas1 commented Oct 14, 2023

phofl commented Oct 14, 2023

mroeschke Oct 15, 2023

phofl Oct 15, 2023

rhshadrach commented Oct 15, 2023 •

edited

Loading

phofl commented Oct 15, 2023

wangwillian0 Oct 16, 2023 •

edited

Loading

phofl Oct 21, 2023

rhshadrach Oct 22, 2023

wangwillian0 Oct 22, 2023 •

edited

Loading

mroeschke left a comment

rhshadrach left a comment

rhshadrach commented Oct 22, 2023

		if nr_of_refs < self.clear_counter // 2:
		self.clear_counter = self.clear_counter // 2

CoW: Use exponential backoff when clearing dead references #55518

CoW: Use exponential backoff when clearing dead references #55518

Conversation

phofl commented Oct 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Oct 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithomas1 Oct 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithomas1 commented Oct 14, 2023

phofl commented Oct 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach commented Oct 15, 2023 • edited Loading

phofl commented Oct 15, 2023

wangwillian0 Oct 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangwillian0 Oct 22, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach commented Oct 22, 2023

jorisvandenbossche Oct 14, 2023 •

edited

Loading

lithomas1 Oct 14, 2023 •

edited

Loading

rhshadrach commented Oct 15, 2023 •

edited

Loading

wangwillian0 Oct 16, 2023 •

edited

Loading

wangwillian0 Oct 22, 2023 •

edited

Loading