ENH: Move factorizing for merge to EA interface #54975

phofl · 2023-09-02T23:40:39Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jbrockmendel thoughts here? This would still need docs obviously

jbrockmendel · 2023-09-06T15:24:42Z

pandas/core/arrays/datetimelike.py

@@ -1703,6 +1704,15 @@ def _groupby_op(
        res_values = res_values.view(self._ndarray.dtype)
        return self._from_backing_data(res_values)

+    def _factorize_with_other(
+        self,
+        other: DatetimeLikeArrayMixin,  # type: ignore[override]


is other Self here?

Oh yes, that's convenient

jbrockmendel · 2023-09-06T15:25:12Z

pandas/core/arrays/base.py

+    ) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp], int]:
+        lk, _ = self._values_for_factorize()
+        rk, _ = other._values_for_factorize()
+        return factorize_with_rizer(lk, rk, sort)


rizer is an awful name. can we improve that while we're here?

renamed to factorizer and renamed the method to factorize_arrays

jbrockmendel · 2023-09-06T15:26:08Z

pandas/core/reshape/merge.py

@@ -2399,133 +2374,27 @@ def _factorize_keys(
    if (
        isinstance(lk.dtype, DatetimeTZDtype) and isinstance(rk.dtype, DatetimeTZDtype)
    ) or (lib.is_np_dtype(lk.dtype, "M") and lib.is_np_dtype(rk.dtype, "M")):
-        # Extract the ndarray (UTC-localized) values
        # Note: we dont need the dtypes to match, as these can still be compared
        lk, rk = cast("DatetimeArray", lk)._ensure_matching_resos(rk)


should the ensure_matching_resos be done in the EA method similar to how the Categorical one does encode_with_whatever?

i see, never mind

do we have tests with timedelta64 and mismatched resos?

pandas/core/arrays/base.py

pandas/core/arrays/categorical.py

jbrockmendel · 2023-09-06T15:29:16Z

pandas/core/arrays/categorical.py

+    def _factorize_with_other(
+        self, other: Categorical, sort: bool = False  # type: ignore[override]
+    ) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp], int]:
+        other = self._encode_with_my_categories(other)


i think that factorize_with_other assumes matching dtypes. if correct, comment/docstring/assert to that effect?

Yes that's correct, have to add docs in general if we want to move forward here

pandas/core/arrays/base.py

pandas/core/reshape/merge_utils.py

jbrockmendel · 2023-09-06T15:33:35Z

Generally looks good, mostly bikeshedding names

# Conflicts: # pandas/core/reshape/merge.py

pandas/core/arrays/arrow/array.py

jbrockmendel · 2023-09-14T15:18:00Z

pandas/core/arrays/arrow/array.py

+        )
+        length = len(dc.dictionary)
+
+        llab, rlab, count = (


nitpick: can this be split into three lines. it takes effort to see which lines correspond to which output

pandas/core/arrays/base.py

pandas/core/arrays/datetimelike.py

jbrockmendel · 2023-09-14T15:21:25Z

a handful of nitpicks, otherwise im very happy to see this refactor

github-actions · 2023-10-15T00:05:38Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

# Conflicts: # doc/source/reference/extensions.rst # pandas/core/reshape/merge.py

phofl · 2023-10-15T13:54:10Z

Sorry this fell off my radar, updated

phofl · 2023-10-15T21:42:35Z

@MarcoGorelli is there a way to shut up the build about the missing example? There are a bunch of ea methods without examples...

jbrockmendel · 2023-10-16T16:23:35Z

is there a way to shut up the build about the missing example? There are a bunch of ea methods without examples...

+1. Many of the examples are not particularly useful (I know because I wrote them to get the CI to shut up)

jbrockmendel · 2023-10-16T16:24:55Z

ci/code_checks.sh

@@ -1,3 +1,6 @@
+
+
+


whitespace here intentional?

MarcoGorelli · 2023-10-16T17:24:59Z

yeah I'd expect pasting into

pandas/scripts/validate_docstrings.py

Lines 44 to 58 in 13e034a

    
           IGNORE_VALIDATION = { 
        
               "Styler.env", 
        
               "Styler.template_html", 
        
               "Styler.template_html_style", 
        
               "Styler.template_html_table", 
        
               "Styler.template_latex", 
        
               "Styler.template_string", 
        
               "Styler.loader", 
        
               "errors.InvalidComparison", 
        
               "errors.LossySetitemError", 
        
               "errors.NoBufferPresent", 
        
               "errors.IncompatibilityWarning", 
        
               "errors.PyperclipException", 
        
               "errors.PyperclipWindowsException", 
        
           }

to work

# Conflicts: # pandas/core/reshape/merge.py

lithomas1 · 2023-12-22T01:02:07Z

Looks like this has gone stale, so I'm bumping off the milestone

mroeschke · 2024-04-23T18:33:41Z

Looks close to merge but has gone stale. Going to close to clear the queue but feel free to reopen when you have time to circle back

phofl and others added 6 commits September 3, 2023 01:30

ENH: Move factorizing for merge to EA interface

917e50b

ENH: Move factorizing for merge to EA interface

26b9db4

Add sort

56043fe

Fix json tests

58c3183

Update for timestamps and duration

9fd6c66

Fix typing

164d9c3

mroeschke requested a review from jbrockmendel September 5, 2023 18:17

mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 5, 2023

jbrockmendel reviewed Sep 6, 2023

View reviewed changes

pandas/core/arrays/base.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Sep 6, 2023

View reviewed changes

pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Sep 6, 2023

View reviewed changes

pandas/core/arrays/base.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Sep 6, 2023

View reviewed changes

pandas/core/reshape/merge_utils.py Outdated Show resolved Hide resolved

phofl added 2 commits September 7, 2023 23:01

Merge remote-tracking branch 'upstream/main' into merge_ea_interface

33170bb

# Conflicts: # pandas/core/reshape/merge.py

Update names

5d7151d

phofl added this to the 2.2 milestone Sep 7, 2023

phofl added 3 commits September 7, 2023 23:24

Add docs

de02cb0

Fix

db84cfc

Fix typing

84ff6c1

jbrockmendel reviewed Sep 14, 2023

View reviewed changes

pandas/core/arrays/arrow/array.py Show resolved Hide resolved

jbrockmendel reviewed Sep 14, 2023

View reviewed changes

pandas/core/arrays/base.py Show resolved Hide resolved

jbrockmendel reviewed Sep 14, 2023

View reviewed changes

pandas/core/arrays/datetimelike.py Show resolved Hide resolved

github-actions bot added the Stale label Oct 15, 2023

phofl added 3 commits October 15, 2023 15:47

Merge remote-tracking branch 'upstream/main' into merge_ea_interface

54a9664

# Conflicts: # doc/source/reference/extensions.rst # pandas/core/reshape/merge.py

Add null count

0e71264

Update

4238a8c

Fix docstring

dea9add

phofl requested a review from mroeschke as a code owner October 15, 2023 13:58

Add whatsnew

8f93283

jbrockmendel reviewed Oct 16, 2023

View reviewed changes

Update code_checks.sh

eb65303

jbrockmendel mentioned this pull request Oct 16, 2023

BUG: merge_asof does not work with datetime by column on Pandas 2.1.x #55453

Closed

3 tasks

phofl added 2 commits November 21, 2023 21:29

Merge remote-tracking branch 'upstream/main' into merge_ea_interface

b81e4e3

# Conflicts: # pandas/core/reshape/merge.py

Update docs

dc5f2ca

lithomas1 removed this from the 2.2 milestone Dec 22, 2023

mroeschke closed this Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Move factorizing for merge to EA interface #54975

ENH: Move factorizing for merge to EA interface #54975

phofl commented Sep 2, 2023 •

edited

Loading

jbrockmendel Sep 6, 2023

phofl Sep 6, 2023

jbrockmendel Sep 6, 2023

phofl Sep 7, 2023

jbrockmendel Sep 6, 2023

jbrockmendel Sep 6, 2023

jbrockmendel Sep 6, 2023

phofl Sep 6, 2023

jbrockmendel Sep 6, 2023

phofl Sep 6, 2023

jbrockmendel commented Sep 6, 2023

jbrockmendel Sep 14, 2023

phofl Oct 15, 2023

jbrockmendel commented Sep 14, 2023

github-actions bot commented Oct 15, 2023

phofl commented Oct 15, 2023

phofl commented Oct 15, 2023

jbrockmendel commented Oct 16, 2023

jbrockmendel Oct 16, 2023

phofl Oct 16, 2023

MarcoGorelli commented Oct 16, 2023

lithomas1 commented Dec 22, 2023

mroeschke commented Apr 23, 2024

ENH: Move factorizing for merge to EA interface #54975

ENH: Move factorizing for merge to EA interface #54975

Conversation

phofl commented Sep 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 14, 2023

github-actions bot commented Oct 15, 2023

phofl commented Oct 15, 2023

phofl commented Oct 15, 2023

jbrockmendel commented Oct 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli commented Oct 16, 2023

lithomas1 commented Dec 22, 2023

mroeschke commented Apr 23, 2024

phofl commented Sep 2, 2023 •

edited

Loading