PERF: datetimelike addition #56373

jbrockmendel · 2023-12-07T02:55:40Z

In [1]: import pandas as pd

In [2]: dti = pd.date_range("2016-01-01", periods=10_000)

In [3]: td = pd.Timedelta(days=1)

In [4]: %timeit dti + td
108 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <- main
94.3 µs ± 3.27 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <- PR

WillAyd

Interesting idea!

WillAyd · 2023-12-07T05:48:09Z

pandas/_libs/tslibs/np_datetime.pyx

+
+
+@cython.overflowcheck(True)
+cpdef cnp.ndarray add_overflowsafe(cnp.ndarray left, cnp.ndarray right):


Are these possible to specialize to int64?

not without specifying an ndim.

WillAyd · 2023-12-07T05:48:21Z

pandas/_libs/tslibs/np_datetime.pyx

+        ndarray iresult = cnp.PyArray_EMPTY(
+            left.ndim, left.shape, cnp.NPY_INT64, 0
+        )
+        cnp.broadcast mi = cnp.PyArray_MultiIterNew3(iresult, left, right)


Pretty cool

WillAyd · 2023-12-07T05:48:38Z

asv_bench/benchmarks/arithmetic.py

@@ -413,17 +413,6 @@ def setup(self):
    def time_add_overflow_arr_rev(self):
        checked_add_with_arr(self.arr, self.arr_rev)

-    def time_add_overflow_arr_mask_nan(self):


Are these considered superfluous?

they use a no-longer-used mask argument. actually these are all pretty superfluous. will update to remove

WillAyd · 2023-12-07T07:14:57Z

pandas/_libs/tslibs/np_datetime.pyx

+
+    # Note: doing this try/except outside the loop improves performance over
+    #  doing it inside the loop.
+    try:


By the way I think there's another way of doing this without a loop that could also be more performant. The inspiration from this comes from a book called Hacker's Delight by Henry S Warren in section 4-2

The code would look something like:

uleft = left.view("uint64") uright = right.view("uint64") summed = uleft + uright neg_overflow = uleft & uright & ~summed pos_overflow = ~uleft & ~uright & summed overflows = (neg_overflow | pos_overflow) >= 0x8000000000000000

The essence of the algorithm is that you use unsigned arithmetic (which has defined behavior) to do the summation, and then inspect the sign bit to see if "overflow" would have happened

With this would need neither the cython.overflow check nor the try...except block

Here's a quick demo

In [42]: left = np.array([2 ** 63 - 1, 2 ** 63 - 1, 2 ** 63 - 1, -(2 ** 63), -(2 ** 63), -(2 ** 63)]) ...: right = np.array([42, 1, 0, -42, -1, 0]) ...: uleft = left.view("uint64") ...: uright = right.view("uint64") ...: summed = uleft + uright ...: neg_overflow = uleft & uright & ~summed ...: pos_overflow = ~uleft & ~uright & summed ...: overflows = (neg_overflow | pos_overflow) >= 0x8000000000000000 ...: overflows Out[42]: array([ True, True, False, True, True, False])

Neat. Does this play nicely with the NPY_NAT check?

Good callout. I suppose you would just mask that as well. So something like:

In [5]: left = np.array([2 ** 63 - 1, 2 ** 63 - 1, 2 ** 63 - 1, -(2 ** 63), -(2 ** 63), -(2 ** 63)]) ...: right = np.array([42, 1, 0, -42, -1, 0]) ...: is_nat = (left == 2 ** 63 - 1) | (right == 2 ** 63 - 1) ...: uleft = left.view("uint64") ...: uright = right.view("uint64") ...: summed = uleft + uright ...: neg_overflow = uleft & uright & ~summed ...: pos_overflow = ~uleft & ~uright & summed ...: overflows = ((neg_overflow | pos_overflow) >= 0x8000000000000000) & ~is_nat ...: overflows Out[5]: array([False, False, False, True, True, False])

I just hardcoded the above to 2 ** 63 - 1 but can replace with macro or python variable, whichever is available

does that actually out-perform? (maybe SIMD or something?) seems like a lot of additional memory allocations

Hard to say. In theory yes this would leverage SIMD better if numpy uses it under the hood (not well versed in if NumPy uses that or not). Also the overflow checks on gcc/clang are pretty fast because they use the compiler builtin, but I think MSVC falls back to a slow implementation (there is a builtin for MSVC too but I don't recall Cython using it - would have to double check)

It requires more research so not something that needs to be done in this PR. Just food for future thought

mroeschke · 2023-12-09T18:36:35Z

Thanks @jbrockmendel

PERF: datetimelike addition

5e576fd

jbrockmendel requested a review from MarcoGorelli as a code owner December 7, 2023 02:55

WillAyd reviewed Dec 7, 2023

View reviewed changes

mroeschke added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 7, 2023

jbrockmendel added 4 commits December 7, 2023 15:06

re-add import

9535407

remove no-longer-used

4a896e9

mypy fixup

5bee11c

troubleshoot 32bit builds

8aa9653

mroeschke approved these changes Dec 9, 2023

View reviewed changes

mroeschke added this to the 2.2 milestone Dec 9, 2023

mroeschke merged commit a6c0ae4 into pandas-dev:main Dec 9, 2023
44 checks passed

jbrockmendel deleted the ref-checked_add branch December 9, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: datetimelike addition #56373

PERF: datetimelike addition #56373

jbrockmendel commented Dec 7, 2023

WillAyd left a comment

WillAyd Dec 7, 2023

jbrockmendel Dec 7, 2023

WillAyd Dec 7, 2023

WillAyd Dec 7, 2023

jbrockmendel Dec 7, 2023

WillAyd Dec 7, 2023 •

edited

Loading

WillAyd Dec 7, 2023

jbrockmendel Dec 7, 2023

WillAyd Dec 7, 2023

jbrockmendel Dec 7, 2023

WillAyd Dec 7, 2023 •

edited

Loading

mroeschke commented Dec 9, 2023



		@cython.overflowcheck(True)
		cpdef cnp.ndarray add_overflowsafe(cnp.ndarray left, cnp.ndarray right):

PERF: datetimelike addition #56373

PERF: datetimelike addition #56373

Conversation

jbrockmendel commented Dec 7, 2023

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke commented Dec 9, 2023

WillAyd Dec 7, 2023 •

edited

Loading

WillAyd Dec 7, 2023 •

edited

Loading