Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW: Use exponential backoff when clearing dead references #55518

Merged
merged 9 commits into from
Oct 22, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Oct 14, 2023

I'd rather keep @rhshadrach 's issue open, we might be able to find a better fix there

The runtime is back to what he had initially on this branch, e.g. 3ms on Joris examples and 1s on Richards issue

@phofl phofl requested a review from WillAyd as a code owner October 14, 2023 15:14
Comment on lines 908 to 909
if nr_of_refs < self.clear_counter // 2:
self.clear_counter = self.clear_counter // 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it is needed to also reduce this? Or, I assume this is mostly to reduce the counter again in case it has become very large, not necessarily to let it become smaller than 500

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, I intended to add a max here, e.g. max(..., 500)

if nr_of_refs < self.clear_counter // 2:
self.clear_counter = self.clear_counter // 2
elif nr_of_refs > self.clear_counter:
self.clear_counter = min(self.clear_counter * 2, nr_of_refs)
Copy link
Member

@jorisvandenbossche jorisvandenbossche Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also just do x2 instead of the min(..) calculation?

I am wondering that if you repeatedly add a reference (for an object that doesn't go out of scope), doesn't that end up increasing the counter only with +1 every time? For example, you have 501 refs, hitting the threshold, at that moment you clear the refs, but nr_of_refs is still 501 after doing that, and then here we set the new threshold to min(500 * 2, 501), i.e. 501?
I must be missing something, because otherwise I don't understand how this fixes the perf issue of the list(df.items()) example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are correct, already added a max here because I came to the same conclusion

@@ -17,6 +17,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
- Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
- Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
- Fixed performance regression in Copy-on-Write mechanism (:issue:`55256`, :issue:`55245`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression occurs without Copy-on-Write too. I think we should mention that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I struggled a bit with the wording, any suggestions?

Copy link
Member

@lithomas1 lithomas1 Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Fixed performance regression in DataFrame copying, DataFrame iteration, and groupby methods taking user defined functions.

?

I think it's better to leave the copy-on-write part out - I personally couldn't find a way to word it without making it seem like the issue was with Copy-on-Write on only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I am not really happy with listing methods, since this affects all kinds of things with wide data frames

@lithomas1
Copy link
Member

Can you add a test that just adds blocks to a BlockValuesRef to make sure the exponential backoff is working as expected?

@lithomas1 lithomas1 added this to the 2.1.2 milestone Oct 14, 2023
@lithomas1 lithomas1 added the Performance Memory or execution speed performance label Oct 14, 2023
@phofl
Copy link
Member Author

phofl commented Oct 14, 2023

Tests are a good idea, added the relevant cases

Co-authored-by: Joris Van den Bossche <[email protected]>
# Use exponential backoff to decide when we want to clear references
# if force=False. Clearing for every insertion causes slowdowns if
# all these objects stay alive, e.g. df.items() for wide DataFrames
# see GH#55245 and GH#55008
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a reference issue to eventually change this to a WeakSet-like implementation? IIRC I saw that discussed somewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue about that as a follow up when this is in

@rhshadrach
Copy link
Member

rhshadrach commented Oct 15, 2023

Edit: Lots of false positives on the first run, reran individual ASVs. No perf regressions.

ASVs for d4c159b
| Change   | Before [0021d241] <cow_dead_regf~4>   | After [d4c159b5] <cow_dead_regf>   |   Ratio | Benchmark (Parameter)                                                                                                                                              |
|----------|---------------------------------------|------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -        | 541±20ns                              | 491±20ns                           |    0.91 | index_cached_properties.IndexCache.time_values('UInt64Index')                                                                                                      |
| -        | 612±20ns                              | 551±30ns                           |    0.9  | index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')                                                                                             |
| -        | 237±20ns                              | 214±3ns                            |    0.9  | indexing_engines.NumericEngineIndexing.time_get_loc_near_middle((<class 'pandas._libs.index.Int64Engine'>, <class 'numpy.int64'>), 'non_monotonic', True, 100000)  |
| -        | 59.3±1ms                              | 53.1±1ms                           |    0.9  | io.json.ToJSONWide.time_to_json('values', 'df_int_float_str')                                                                                                      |
| -        | 1.34±0.1ms                            | 1.21±0.01ms                        |    0.9  | io.parsers.ConcatDateCols.time_check_concat(1234567890, 2)                                                                                                         |
| -        | 26.6±0.7ms                            | 23.9±0.8ms                         |    0.9  | io.stata.Stata.time_read_stata('ty')                                                                                                                               |
| -        | 47.1±2ms                              | 42.6±0.3ms                         |    0.9  | strings.Methods.time_findall('string[python]')                                                                                                                     |
| -        | 9.52±1μs                              | 8.60±0.06μs                        |    0.9  | timeseries.TzLocalize.time_infer_dst(tzutc())                                                                                                                      |
| -        | 859±20ns                              | 777±30ns                           |    0.9  | tslibs.tslib.TimeIntsToPydatetime.time_ints_to_pydatetime('timestamp', 0, tzlocal())                                                                               |
| -        | 252±2μs                               | 224±20μs                           |    0.89 | algos.isin.IsIn.time_isin_categorical('string[python]')                                                                                                            |
| -        | 2.27±0.03ms                           | 2.03±0.02ms                        |    0.89 | frame_methods.Apply.time_apply_pass_thru                                                                                                                           |
| -        | 23.9±2μs                              | 21.4±0.4μs                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'count', 'direct', 1, 'cython')                                                                                 |
| -        | 242±20ns                              | 215±4ns                            |    0.89 | indexing_engines.NumericEngineIndexing.time_get_loc_near_middle((<class 'pandas._libs.index.Int64Engine'>, <class 'numpy.int64'>), 'monotonic_decr', True, 100000) |
| -        | 128±2μs                               | 111±2μs                            |    0.87 | algos.isin.IsIn.time_isin_categorical('uint64')                                                                                                                    |
| -        | 111±2ms                               | 96.2±2ms                           |    0.87 | groupby.Transform.time_transform_lambda_max                                                                                                                        |
| -        | 321±20ns                              | 280±20ns                           |    0.87 | index_cached_properties.IndexCache.time_inferred_type('Int64Index')                                                                                                |
| -        | 671±80ns                              | 587±30ns                           |    0.87 | index_cached_properties.IndexCache.time_is_monotonic_decreasing('Int64Index')                                                                                      |
| -        | 613±60μs                              | 526±4μs                            |    0.86 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function mul>)                                                         |
| -        | 13.2±0.1ms                            | 11.4±0.07ms                        |    0.86 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'unique_monotonic_inc')                                                                   |
| -        | 21.0±0.4ms                            | 17.9±0.6ms                         |    0.85 | frame_methods.Fillna.time_bfill(True, 'object')                                                                                                                    |
| -        | 13.7±0.1ms                            | 11.5±0.09ms                        |    0.84 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'nonunique_monotonic_inc')                                                                |
| -        | 1.98±0.01ms                           | 1.64±0.03ms                        |    0.83 | join_merge.Concat.time_concat_series(0)                                                                                                                            |
| -        | 24.1±0.2ms                            | 19.8±0.1ms                         |    0.82 | tslibs.normalize.Normalize.time_is_date_array_normalized(1000000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                |
| -        | 44.9±0.3ms                            | 36.1±0.4ms                         |    0.8  | frame_methods.Duplicated.time_frame_duplicated_wide                                                                                                                |
| -        | 3.17±0.04μs                           | 2.54±0.01μs                        |    0.8  | tslibs.normalize.Normalize.time_is_date_array_normalized(100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                    |
| -        | 250±0.4μs                             | 201±3μs                            |    0.8  | tslibs.normalize.Normalize.time_is_date_array_normalized(10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)                                                  |
| -        | 6.24±2ms                              | 4.67±0.02ms                        |    0.75 | sparse.GetItemMask.time_mask(nan)                                                                                                                                  |
| -        | 6.39±0.03ms                           | 4.75±0.03ms                        |    0.74 | frame_methods.AsType.time_astype(('float64', 'Float64'), False)                                                                                                    |
| -        | 6.72±0.1ms                            | 4.94±0.01ms                        |    0.74 | frame_methods.AsType.time_astype(('float64', 'float64[pyarrow]'), False)                                                                                           |
| -        | 10.0±0.3μs                            | 7.48±0.08μs                        |    0.74 | timeseries.TzLocalize.time_infer_dst(None)                                                                                                                         |
| -        | 24.9±1ms                              | 18.1±0.5ms                         |    0.73 | reindex.Reindex.time_reindex_multiindex_with_cache                                                                                                                 |
| -        | 2.43±0.01ms                           | 1.72±0.02ms                        |    0.71 | reshape.ReshapeExtensionDtype.time_stack('datetime64[ns, US/Pacific]')                                                                                             |
| -        | 2.25±0.02ms                           | 1.58±0.01ms                        |    0.7  | reshape.ReshapeExtensionDtype.time_stack('Period[s]')                                                                                                              |
| -        | 622±10μs                              | 405±2μs                            |    0.65 | frame_methods.Iteration.time_items_cached                                                                                                                          |
| -        | 16.7±0.2ms                            | 7.19±0.05ms                        |    0.43 | frame_methods.Iteration.time_items                                                                                                                                 |
| -        | 187±0.5ms                             | 74.0±1ms                           |    0.39 | groupby.Apply.time_copy_overhead_single_col(4)                                                                                                                     |
| -        | 995±6ms                               | 350±3ms                            |    0.35 | groupby.Apply.time_copy_function_multi_col(4)                                                                                                                      |
| -        | 965±5ms                               | 107±0.8ms                          |    0.11 | frame_methods.GetDtypeCounts.time_info                                                                                                                             |
| -        | 265±1ms                               | 25.7±0.03ms                        |    0.1  | frame_methods.Iteration.time_iteritems_indexing                                                     

From #55256 I'm now getting 1.34s with no CoW, 1.63s with.

@phofl
Copy link
Member Author

phofl commented Oct 15, 2023

Thx @rhshadrach appreciate you running the asvs.

So we are good here? I want to look into the groupby problem independently to see if there is something we can do

]
nr_of_refs = len(self.referenced_blocks)
if nr_of_refs < self.clear_counter // 2:
self.clear_counter = max(self.clear_counter // 2, 500)
Copy link
Contributor

@wangwillian0 wangwillian0 Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a shrink factor of 4 or more. If it's the same as the growth factor it can create a few corner cases that will still have O(n^2). e.g. length going back and forth between (500*2^n)-1 and (500*2^n)+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this happen as well for a shrink factor of 4? And this would only happen If we have this interleaved with inlace modifications, e.g if force=True, correct? Merging for now, but happy to follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging for now, but happy to follow up

+1

Copy link
Contributor

@wangwillian0 wangwillian0 Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a factor of 4 you would need to change the length to the extremes of the range [500*2^(n-1); 500*2^n], which is at least 500 (and more for a larger n), this is much better than triggering the slow operation on just adding and removing 3 references.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough to address the regression for now

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit 3a90faa into pandas-dev:main Oct 22, 2023
37 checks passed
@rhshadrach
Copy link
Member

Thanks @phofl

@rhshadrach rhshadrach added the Regression Functionality that used to work in a prior pandas version label Oct 22, 2023
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 22, 2023
@phofl phofl deleted the cow_dead_regf branch October 22, 2023 14:47
mroeschke pushed a commit that referenced this pull request Oct 22, 2023
… clearing dead references) (#55625)

Backport PR #55518: CoW: Use exponential backoff when clearing dead references

Co-authored-by: Patrick Hoefler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: hash_pandas_object performance regression between 2.1.0 and 2.1.1
7 participants