PERF: Use fused types for map_infer_mask #55736

rhshadrach · 2023-10-27T20:10:06Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Just a hack at this point, seeing if this is a terrible idea. This takes us from a 40% perf hit using Cython 3.0.4 (from #55179) to a 35% perf improvement for strings.Methods.time_len('string[python]'). I haven't run a full ASV yet, but can if this seems at all viable. I'm not certain if the fused types really cover all uses for map_infer_mask (the test suite seems to think so).

cc @jbrockmendel @WillAyd

ASVs

| Change   | Before [984d7554] <main>   | After [c4bdb1aa] <perf_map_infer_mask_fused>   |   Ratio | Benchmark (Parameter)                                                                                         |
|----------|----------------------------|------------------------------------------------|---------|---------------------------------------------------------------------------------------------------------------|
| -        | 197±4μs                    | 178±5μs                                        |    0.91 | algos.isin.IsInForObjects.time_isin('nans', 'short')                                                          |
| -        | 12.9±0.05ms                | 11.7±0.04ms                                    |    0.91 | strings.Methods.time_endswith('str')                                                                          |
| -        | 15.8±0.2ms                 | 14.3±0.07ms                                    |    0.91 | strings.Methods.time_title('str')                                                                             |
| -        | 146±10μs                   | 132±0.8μs                                      |    0.9  | algos.isin.IsIn.time_isin_mismatched_dtype('object')                                                          |
| -        | 8.28±0.2ms                 | 7.44±0.07ms                                    |    0.9  | frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)                                                  |
| -        | 10.4±0.07ms                | 9.42±0.1ms                                     |    0.9  | frame_methods.Equals.time_frame_object_unequal                                                                |
| -        | 11.9±0.5μs                 | 10.7±0.2μs                                     |    0.9  | frame_methods.GetNumericData.time_frame_get_numeric_data                                                      |
| -        | 630±2μs                    | 569±2μs                                        |    0.9  | libs.InferDtype.time_infer_dtype('np-object')                                                                 |
| -        | 6.03±0.6μs                 | 5.40±0.02μs                                    |    0.9  | series_methods.Any.time_any(1000000, 'fast', 'bool')                                                          |
| -        | 9.41±0.03ms                | 8.50±0.1ms                                     |    0.9  | strings.Contains.time_contains('str', False)                                                                  |
| -        | 16.4±0.04ms                | 14.9±0.2ms                                     |    0.9  | strings.Methods.time_get('string[python]')                                                                    |
| -        | 18.0±0.4ms                 | 16.3±0.2ms                                     |    0.9  | strings.Methods.time_match('str')                                                                             |
| -        | 12.6±0.1ms                 | 11.3±0.09ms                                    |    0.9  | strings.Methods.time_upper('string[python]')                                                                  |
| -        | 8.36±0.1ms                 | 7.51±0.05ms                                    |    0.9  | strings.Methods.time_zfill('str')                                                                             |
| -        | 9.06±0.4μs                 | 8.05±0.03μs                                    |    0.89 | frame_ctor.FromNDArray.time_frame_from_ndarray                                                                |
| -        | 15.4±0.08ms                | 13.6±0.09ms                                    |    0.89 | strings.Methods.time_find('string[python]')                                                                   |
| -        | 8.59±0.2ms                 | 7.66±0.05ms                                    |    0.89 | strings.Methods.time_isupper('str')                                                                           |
| -        | 13.0±0.02ms                | 11.5±0.02ms                                    |    0.89 | strings.Methods.time_slice('string[python]')                                                                  |
| -        | 11.3±0.04ms                | 9.94±0.2ms                                     |    0.88 | frame_methods.Equals.time_frame_object_equal                                                                  |
| -        | 13.2±0.2ms                 | 11.7±0.07ms                                    |    0.88 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'nonunique_monotonic_inc')           |
| -        | 13.3±0.2ms                 | 11.7±0.07ms                                    |    0.88 | indexing.DataFrameNumericIndexing.time_loc_dups(<class 'numpy.float64'>, 'unique_monotonic_inc')              |
| -        | 28.5±0.7ms                 | 25.0±1ms                                       |    0.88 | join_merge.ConcatDataFrames.time_c_ordered(0, False)                                                          |
| -        | 762±0.7μs                  | 674±3μs                                        |    0.88 | libs.InferDtype.time_infer_dtype_skipna('np-object')                                                          |
| -        | 10.3±0.01ms                | 9.11±0.05ms                                    |    0.88 | strings.Methods.time_isalnum('str')                                                                           |
| -        | 9.06±0.04ms                | 7.97±0.01ms                                    |    0.88 | strings.Methods.time_istitle('str')                                                                           |
| -        | 8.76±0.1ms                 | 7.67±0.03ms                                    |    0.88 | strings.Methods.time_lstrip('str')                                                                            |
| -        | 9.11±0.02ms                | 8.00±0.04ms                                    |    0.88 | strings.Methods.time_lstrip('string[python]')                                                                 |
| -        | 8.72±0.02ms                | 7.69±0.03ms                                    |    0.88 | strings.Methods.time_rstrip('str')                                                                            |
| -        | 8.83±0.06ms                | 7.79±0.02ms                                    |    0.88 | strings.Methods.time_zfill('string[python]')                                                                  |
| -        | 297±8μs                    | 258±9μs                                        |    0.87 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function gt>) |
| -        | 284±20μs                   | 246±20μs                                       |    0.87 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function lt>)     |
| -        | 17.5±0.6ms                 | 15.2±0.06ms                                    |    0.87 | strings.Methods.time_fullmatch('string[python]')                                                              |
| -        | 8.77±0.03ms                | 7.66±0.06ms                                    |    0.87 | strings.Methods.time_isalpha('str')                                                                           |
| -        | 8.69±0.04ms                | 7.60±0.07ms                                    |    0.87 | strings.Methods.time_islower('str')                                                                           |
| -        | 7.28±0.09ms                | 6.34±0.03ms                                    |    0.87 | strings.Methods.time_isnumeric('str')                                                                         |
| -        | 12.1±0.1ms                 | 10.5±0.04ms                                    |    0.87 | strings.Methods.time_normalize('string[python]')                                                              |
| -        | 16.1±0.4ms                 | 14.0±0.4ms                                     |    0.87 | strings.Methods.time_rfind('string[pyarrow]')                                                                 |
| -        | 9.12±0.05ms                | 7.96±0.02ms                                    |    0.87 | strings.Methods.time_rstrip('string[python]')                                                                 |
| -        | 9.06±0.05ms                | 7.85±0.2ms                                     |    0.87 | strings.Methods.time_strip('str')                                                                             |
| -        | 1.23±0.05ms                | 1.06±0.04ms                                    |    0.86 | frame_methods.Fillna.time_bfill(True, 'datetime64[ns, tz]')                                                   |
| -        | 7.36±0.04ms                | 6.32±0.05ms                                    |    0.86 | strings.Methods.time_isdecimal('str')                                                                         |
| -        | 8.87±0.2ms                 | 7.61±0.06ms                                    |    0.86 | strings.Methods.time_lower('str')                                                                             |
| -        | 9.31±0.09ms                | 8.00±0.02ms                                    |    0.86 | strings.Methods.time_lower('string[python]')                                                                  |
| -        | 12.7±0.3ms                 | 10.8±0.3ms                                     |    0.86 | strings.Methods.time_slice('str')                                                                             |
| -        | 76.8±0.7ms                 | 65.8±0.3ms                                     |    0.86 | strings.Slice.time_vector_slice                                                                               |
| -        | 13.7±0.2ms                 | 11.6±0.03ms                                    |    0.85 | strings.Contains.time_contains('string[python]', True)                                                        |
| -        | 11.5±0.2ms                 | 9.72±0.06ms                                    |    0.85 | strings.Methods.time_endswith('string[python]')                                                               |
| -        | 7.40±0.01ms                | 6.32±0.05ms                                    |    0.85 | strings.Methods.time_isdigit('str')                                                                           |
| -        | 7.02±0.1ms                 | 5.95±0.02ms                                    |    0.85 | strings.Methods.time_isspace('str')                                                                           |
| -        | 11.9±0.2ms                 | 10.1±0.02ms                                    |    0.85 | strings.Methods.time_normalize('str')                                                                         |
| -        | 11.9±0.1ms                 | 10.1±0.02ms                                    |    0.85 | strings.Methods.time_replace('string[python]')                                                                |
| -        | 16.5±0.2ms                 | 14.1±0.1ms                                     |    0.85 | strings.Methods.time_rfind('string[python]')                                                                  |
| -        | 9.51±0.07ms                | 8.11±0.02ms                                    |    0.85 | strings.Methods.time_strip('string[python]')                                                                  |
| -        | 12.4±0.1ms                 | 10.5±0.2ms                                     |    0.85 | strings.Methods.time_upper('str')                                                                             |
| -        | 16.9±0.2ms                 | 14.2±0.08ms                                    |    0.84 | strings.Methods.time_match('string[python]')                                                                  |
| -        | 11.9±0.1ms                 | 9.80±0.03ms                                    |    0.82 | strings.Methods.time_replace('str')                                                                           |
| -        | 11.1±0.04ms                | 9.07±0.08ms                                    |    0.81 | strings.Methods.time_startswith('string[python]')                                                             |
| -        | 9.37±0.01ms                | 7.52±0.03ms                                    |    0.8  | strings.Methods.time_isalnum('string[python]')                                                                |
| -        | 8.32±0.05ms                | 6.58±0.04ms                                    |    0.79 | strings.Contains.time_contains('string[python]', False)                                                       |
| -        | 8.08±0.1ms                 | 6.36±0.01ms                                    |    0.79 | strings.Methods.time_istitle('string[python]')                                                                |
| -        | 4.89±0.2ms                 | 3.83±0.2ms                                     |    0.78 | libs.FastZip.time_lib_fast_zip                                                                                |
| -        | 16.5±0.2ms                 | 12.9±0.1ms                                     |    0.78 | strings.Methods.time_count('string[python]')                                                                  |
| -        | 22.5±0.1ms                 | 17.3±0.1ms                                     |    0.77 | frame_methods.Fillna.time_bfill(True, 'object')                                                               |
| -        | 7.73±0.03ms                | 5.93±0.04ms                                    |    0.77 | strings.Methods.time_isalpha('string[python]')                                                                |
| -        | 7.79±0.04ms                | 6.04±0.02ms                                    |    0.77 | strings.Methods.time_islower('string[python]')                                                                |
| -        | 7.87±0.03ms                | 6.05±0.01ms                                    |    0.77 | strings.Methods.time_isupper('string[python]')                                                                |
| -        | 6.56±0.03ms                | 4.80±0.03ms                                    |    0.73 | strings.Methods.time_isdecimal('string[python]')                                                              |
| -        | 6.62±0.03ms                | 4.80±0ms                                       |    0.73 | strings.Methods.time_isnumeric('string[python]')                                                              |
| -        | 6.59±0.02ms                | 4.76±0.03ms                                    |    0.72 | strings.Methods.time_isdigit('string[python]')                                                                |
| -        | 6.19±0.02ms                | 4.43±0.03ms                                    |    0.71 | strings.Methods.time_isspace('string[python]')                                                                |
| -        | 4.83±0.04ms                | 3.00±0.02ms                                    |    0.62 | strings.Methods.time_len('string[python]')                                                                    |

WillAyd · 2023-10-27T20:42:37Z

pandas/_libs/lib.pyx

        ndarray[object] arr,
        object f,
        const uint8_t[:] mask,
+        numeric_object_t[:] dummy,


Does changing cnp.dtype type=np.dtype(object) to numeric_object_t type = np.dtype(object) work?

I think this is a nice change but would be nice to get rid of some of the indirection

might be cleaner to create result in the calling function?

@WillAyd: I tried this first, and it compiles but then fails when called with TypeError: an integer is required. In other places where we do this trick, e.g.

pandas/pandas/_libs/groupby.pyx

Lines 1341 to 1345 in 984d755

cdef numeric_object_t _get_min_or_max(

numeric_object_t val,

bint compute_max,

bint is_datetimelike,

):

we pass a value. That's the reason I went with an array here - we can create an empty one without needing to produce a value.

@jbrockmendel - will update.

WillAyd · 2023-10-28T14:54:15Z

pandas/_libs/lib.pyx

+    np.ndarray
+    """
+    # Passed so we can use infused types depending on the result dtype
+    dummy = np.empty(0, dtype=dtype)


I think you can also just pass a pointer to avoid python runtime interaction

I'm not familiar - can you spell out the syntax here?

See if this works

http://docs.cython.org/en/latest/src/userguide/fusedtypes.html#fused-types-and-arrays

Cannot take address of Python variable 'dummy'

probably need to cdef it as ndarray

Yea shouldn't even need an array. Typing from phone so sorry for grammar but should be

cdef numeric_object_t dummy

Then when calling function use address of operator &dummy

The callee should have numeric_object_t *dummy as an argument

cdef numeric_object_t dummy

I don't believe we can use fused types unless they are utilized in the args. Doing either one of these:

cdef numeric_object_t dummy cdef ndarray[numeric_object_t] dummy

leads to compilation error Type is not specialized. Also doing

cdef ndarray dummy cython.address(dummy)

gives Cannot take address of Python variable 'dummy' as before.

Ah OK. That's unfortunate - I think @jbrockmendel has the best suggestion then to make result caller allocated

I think @jbrockmendel has the best suggestion then to make result caller allocated

I misunderstood that suggestion before - I think this would only be doable if we are okay with also moving convert out of the function. So instead of something like:

result = lib.map_infer_mask(arr, f, mask.view(np.uint8), map_convert)

we are now doing

result = np.empty_like(arr, dtype="object") lib.map_infer_mask(result, arr, f, mask.view(np.uint8)) if map_convert: result = maybe_convert_objects(result)

This seems worse to me (duplicating the convert code), but happy to make the change if that's what's preferred by others.

this would only be doable if we are okay with also moving convert out of the function

yes im fine with this. though i think if you called the result something other than result you might not have the same TypeError issues, and since we're not iterating over it its no big deal perf-wise

pandas/_libs/lib.pyx

jbrockmendel · 2023-10-28T17:52:02Z

pandas/_libs/lib.pyx

@@ -2888,7 +2936,7 @@ def map_infer_mask(
    """
    cdef:
        Py_ssize_t i, n
-        ndarray result
+        ndarray[numeric_object_t] result


can we narrow down what types are actually needed here? is there a danger of types that aren't part of numeric_object_t?

The test suite only uses bool, int64, and object. I'm going to check calls to make sure that's guaranteed. Assuming it is, does something like

ctypedef fused bool_int64_object_t: uint8_t int64_t object

in dtypes.pxd work?

All calls are either hard coded as bool, int64, or object. The only part that was a little tricky was the use in the various _str_map methods. Here, specifying the argument dtype=np.dtype(dtype) in map_infer_mask is guarded by if is_integer_dtype(dtype) or is_bool_dtype(dtype). All calls to the _str_map method are either a string dtype, int64, or bool.

If we create the type here what would we name it? I agree its good to specialize instead of using numeric_object_t but it would be nice if the name could speak to the concept of what the type supports instead of being a literal declaration of its member types

I agree with the sentiment but have no recommendations for a "natural" name. If we are not happy with the current name uint8_int64_object_t, then the only alternative that comes to my mind is to move it from dtypes.pxd to lib.pyx just above the map_infer_mask function and name it map_infer_mask_t to highlight its local use.

current name uint8_int64_object_t [...] map_infer_mask_t to highlight its local use.

im fine with either of these options

@WillAyd - friendly ping on uint8_int64_object_t vs map_infer_mask_t vs something else

Let's go with the first option for now. Can always rename later

…_map_infer_mask_fused

WillAyd

lgtm

lithomas1 · 2023-11-02T21:20:58Z

pandas/_libs/lib.pyx

+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def _map_infer_mask(


make this a cdef function?

I don't believe so. When I try, I get:

Error compiling Cython file: ------------------------------------------------------------ ... np.ndarray """ cdef Py_ssize_t n = len(arr) result = np.empty(n, dtype=dtype) _map_infer_mask( ^ ------------------------------------------------------------ /home/richard/dev/pandas/pandas/_libs/lib.pyx:2892:19: no suitable method found

I believe this is cython/cython#2462

I'm not certain, but I believe it happens when calling a cdef function from a def function where the def function does not use the fused types.

lithomas1 · 2023-11-02T21:23:42Z

Can you add a whatsnew? Otherwise, LGTM.

jbrockmendel · 2023-11-02T22:16:55Z

pandas/_libs/lib.pyx

+        const uint8_t[:] mask,
+        bint convert=True,
+        object na_value=no_default,
+        cnp.dtype dtype=np.dtype(object)


i think dtype and convert are not needed here?

Doh - this is why you don't refactor late at night 😆 Removed, thanks.

@lithomas1 were you once looking at adding Wextra or increasing the meson warning setting? I know we'd have to do a bit more work to get there but I think that would catch issues like this in the future (via -Wunused-argument, which could alternately be added directly)

rhshadrach · 2023-11-03T20:07:00Z

If there are no objections, I'd like to try to get this in tonight so I can rerun ASVs in #55179

lithomas1

LGTM. Let's wait for #55817 to clear out the Windows failures, just to make sure everything is OK here with CI, though.

mroeschke · 2023-11-07T01:14:12Z

Thanks @rhshadrach

PERF: Use fused types for map_infer_mask

5921bf1

rhshadrach added Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Oct 27, 2023

Simplify

5c4d8d9

WillAyd reviewed Oct 27, 2023

View reviewed changes

Refactor and docstrings

c4bdb1a

WillAyd reviewed Oct 28, 2023

View reviewed changes

jbrockmendel reviewed Oct 28, 2023

View reviewed changes

pandas/_libs/lib.pyx Show resolved Hide resolved

jbrockmendel reviewed Oct 28, 2023

View reviewed changes

rhshadrach marked this pull request as ready for review October 29, 2023 18:01

rhshadrach added 3 commits November 1, 2023 21:24

Rework

c58e682

Merge branch 'main' of https://github.com/pandas-dev/pandas into perf…

8ff9047

…_map_infer_mask_fused

Remove docstring

4634a24

rhshadrach requested review from WillAyd and jbrockmendel November 2, 2023 16:09

WillAyd approved these changes Nov 2, 2023

View reviewed changes

lithomas1 reviewed Nov 2, 2023

View reviewed changes

jbrockmendel reviewed Nov 2, 2023

View reviewed changes

Cleanup, whatsnew

59c836d

lithomas1 approved these changes Nov 3, 2023

View reviewed changes

Merge branch 'main' into perf_map_infer_mask_fused

da55ccc

mroeschke added this to the 2.2 milestone Nov 7, 2023

mroeschke approved these changes Nov 7, 2023

View reviewed changes

mroeschke merged commit 0d761ef into pandas-dev:main Nov 7, 2023
29 of 34 checks passed

rhshadrach deleted the perf_map_infer_mask_fused branch November 7, 2023 02:26

lithomas1 mentioned this pull request Nov 8, 2023

DEP: Use Cython 3.0 #55179

Merged

5 tasks

jbrockmendel mentioned this pull request Aug 8, 2024

REF (string dtype): de-duplicate _str_map (2) #59451

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Use fused types for map_infer_mask #55736

PERF: Use fused types for map_infer_mask #55736

rhshadrach commented Oct 27, 2023 •

edited

Loading

WillAyd Oct 27, 2023

jbrockmendel Oct 27, 2023

rhshadrach Oct 28, 2023

WillAyd Oct 28, 2023

rhshadrach Oct 28, 2023

WillAyd Oct 28, 2023

rhshadrach Oct 28, 2023

jbrockmendel Oct 28, 2023

WillAyd Oct 28, 2023

rhshadrach Oct 29, 2023 •

edited

Loading

WillAyd Oct 30, 2023

rhshadrach Oct 30, 2023

jbrockmendel Oct 31, 2023

jbrockmendel Oct 28, 2023

rhshadrach Oct 28, 2023 •

edited

Loading

rhshadrach Oct 29, 2023

WillAyd Oct 30, 2023

rhshadrach Oct 30, 2023 •

edited

Loading

jbrockmendel Oct 31, 2023

rhshadrach Nov 1, 2023

WillAyd Nov 2, 2023

WillAyd left a comment

lithomas1 Nov 2, 2023

rhshadrach Nov 3, 2023

lithomas1 commented Nov 2, 2023

jbrockmendel Nov 2, 2023

rhshadrach Nov 3, 2023

WillAyd Nov 3, 2023

rhshadrach commented Nov 3, 2023

lithomas1 left a comment

mroeschke commented Nov 7, 2023

	cdef numeric_object_t _get_min_or_max(
	numeric_object_t val,
	bint compute_max,
	bint is_datetimelike,
	):

PERF: Use fused types for map_infer_mask #55736

PERF: Use fused types for map_infer_mask #55736

Conversation

rhshadrach commented Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach Oct 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach Oct 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithomas1 commented Nov 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach commented Nov 3, 2023

lithomas1 left a comment

Choose a reason for hiding this comment

mroeschke commented Nov 7, 2023

rhshadrach commented Oct 27, 2023 •

edited

Loading

rhshadrach Oct 29, 2023 •

edited

Loading

rhshadrach Oct 28, 2023 •

edited

Loading

rhshadrach Oct 30, 2023 •

edited

Loading