Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index.difference not always returning a unique set of values #55113

Merged
merged 5 commits into from
Sep 18, 2023

Conversation

lukemanley
Copy link
Member

Index.difference has two short-circuit paths which currently differ in behavior from the main path in that they do not return a unique set of values when duplicates exist:

In [1]: import pandas as pd

In [2]: idx = pd.Index([1, 1])

In [3]: idx.difference([2])
Out[3]: Index([1], dtype='int64')      <- main path: unique

In [4]: idx.difference([])
Out[4]: Index([1, 1], dtype='int64')   <- short circuit path when right is empty: non-unique

In [5]: idx.difference([True])
Out[5]: Index([1, 1], dtype='int64')   <- short circuit path when right is considered "non-comparable": non-unique

@lukemanley lukemanley added Bug setops union, intersection, difference, symmetric_difference labels Sep 12, 2023
@lukemanley lukemanley added this to the 2.2 milestone Sep 12, 2023
@@ -799,8 +799,16 @@ def test_difference_empty_arg(self, index, sort):
first = index[5:20]
first.name = "name"
result = first.difference([], sort)
expected = first.unique()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you construct a completely new Index for `expected instead (and below)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@mroeschke mroeschke merged commit e0c5b87 into pandas-dev:main Sep 18, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

hedeershowk pushed a commit to hedeershowk/pandas that referenced this pull request Sep 20, 2023
…ndas-dev#55113)

* BUG: Index.difference not always returning a unique set of values

* whatsnew

* udpate test

* update test
@lukemanley lukemanley deleted the bug-index-difference-unique branch November 16, 2023 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug setops union, intersection, difference, symmetric_difference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants