Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: get_indexer_non_unique() does not find NaNs. #44481

Closed
3 tasks done
johannes-mueller opened this issue Nov 16, 2021 · 1 comment
Closed
3 tasks done

BUG: get_indexer_non_unique() does not find NaNs. #44481

johannes-mueller opened this issue Nov 16, 2021 · 1 comment
Labels
Bug Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@johannes-mueller
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

idx = pd.Index([1, 2, np.nan])
target = pd.Index([1, np.nan])

print(idx.get_indexer(target))
print(idx.get_indexer_non_unique(target))

Issue Description

The script outputs:

[0 2]
(array([ 0, -1]), array([1]))

So get_indexer() correctly found the NaN value at position 2 but get_indexer_non_unique() has not found the NaN value anywhere.

This is one aspect of #44465 which needs to be split due to two different root causes.

Expected Behavior

Expected output

[0 2]
(array([ 0, 2]), array([], dtype=int64))

Installed Versions

INSTALLED VERSIONS ------------------ commit : 700be61 python : 3.8.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-90-lowlatency Version : #101-Ubuntu SMP PREEMPT Fri Oct 15 20:57:56 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.4.0.dev0+1132.g700be617eb
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.5.3
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.24.2
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.23.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.11.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.0
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.2
sqlalchemy : 1.4.26
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

@johannes-mueller johannes-mueller added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 16, 2021
@johannes-mueller johannes-mueller changed the title BUG: BUG: get_indexer_non_unique() does not find NaNs. Nov 16, 2021
@mroeschke mroeschke added Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 21, 2021
@lukemanley
Copy link
Member

This looks to be fixed on main and is producing the expected behavior above. It appears to be tested as well in test_get_indexer_non_unique_multiple_nans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

3 participants