You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This returns -1 for the new arrow string dtype because we cast None to np.nan when creating the array. We wanted to be as close to object dtype as possible, but patching this does not seem ideal. Thoughts?
In general (for non-object dtypes), it seems we are strict in which NA-like values to accept here:
In [7]: idx = pd.Index([0.1, 0.2, None], dtype="float64")
In [8]: idx.get_indexer([np.nan])
Out[8]: array([2])
In [9]: idx.get_indexer([None])
Out[9]: array([-1])
But I agree that for the migration from object dtype, we should maybe be more flexible here, and also find None values.
Although even then it will not exactly preserve behaviour. Because right now an object dtype Index can hold both None and NaN (and get_indexer finds them separately), and those will become both NaN, letting get_indexer find both. No way around that.
If dataframe values of None converts to np.nan , then get_indexer function should change the None to np.nan before searching the full dataframe.
In base.py , we have the get_indexer function , if we add the following lines of code , might help,
for i,v in enumerate(target):
if v == None:
target[i] = np.nan
That needs a fix at a different place. This is probably not a good issue to get started, I would recommend that you look for issues that are labeled Good first issue
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
This returns -1 for the new arrow string dtype because we cast None to np.nan when creating the array. We wanted to be as close to object dtype as possible, but patching this does not seem ideal. Thoughts?
cc @jorisvandenbossche
Expected Behavior
see above
Installed Versions
Replace this line with the output of pd.show_versions()
The text was updated successfully, but these errors were encountered: