BUG: Float64Dtype and isnull() functions unexpected behavior #53887

jayantsahewal · 2023-06-27T18:23:02Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

def weird_behavior(df, cols):
    print(df.dtypes.tolist())
    for col in cols:
        print(df[col].isna().sum(), df[col].apply(pd.isna).sum())
        print(df[col].isnull().sum(), df[col].apply(pd.isnull).sum())
    print()

test_1_df = pd.DataFrame(data=[[1.0, 0.0], [1.0, 1], [1.0, 1], [4.0, 0], [0.0, 0]], columns=['a', 'b'], dtype=pd.Float64Dtype())
test_1_df['c'] = test_1_df['a'] / test_1_df['b']
test_1_df['d'] = float('inf')
test_1_df['e'] = np.nan
test_2_df = test_1_df.astype(pd.Float64Dtype())
test_3_df = test_1_df.astype(np.float64)
cols = ['c', 'd', 'e']
weird_behavior(test_1_df, cols)
weird_behavior(test_2_df, cols)
weird_behavior(test_3_df, cols)

test_4_df = pd.DataFrame(data=[[1.0, 0.0], [1.0, 1], [1.0, 1], [4.0, 0], [0.0, 0]], columns=['a', 'b'])
test_4_df['c'] = test_4_df['a'] / test_4_df['b']
test_4_df['d'] = float('inf')
test_4_df['e'] = np.nan
test_5_df = test_4_df.astype(pd.Float64Dtype())
test_6_df = test_4_df.astype(np.float64)
cols = ['c', 'd', 'e']
weird_behavior(test_4_df, cols)
weird_behavior(test_5_df, cols)
weird_behavior(test_6_df, cols)

[Float64Dtype(), Float64Dtype(), Float64Dtype(), dtype('float64'), dtype('float64')]
0 1
0 1
0 0
0 0
5 5
5 5

[Float64Dtype(), Float64Dtype(), Float64Dtype(), Float64Dtype(), Float64Dtype()]
0 1
0 1
0 0
0 0
5 5
5 5

[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')]
1 1
1 1
0 0
0 0
5 5
5 5

[Float64Dtype(), Float64Dtype(), Float64Dtype(), Float64Dtype(), Float64Dtype()]
1 1
1 1
0 0
0 0
5 5
5 5

[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')]
1 1
1 1
0 0
0 0
5 5
5 5


### Issue Description

Specifying `dtype=pd.Float64Dtype()` while creating a pandas dataFrame leads to inconsistent results in using `series.isnull()` vs. `series.apply(pd.isnull)`. 

If I don't specify the dtype while creating a pandas dataFrame, it works as expected in using `series.isnull()` vs. `series.apply(pd.isnull)`

### Expected Behavior

1 1
1 1
0 0
0 0
5 5
5 5


### Installed Versions

<details>

commit           : 965ceca9fd796940050d6fc817707bba1c4f9bff
python           : 3.10.8.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.10.157-139.675.amzn2.x86_64
Version          : #1 SMP Thu Dec 8 01:29:11 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.0.2
numpy            : 1.23.4
pytz             : 2022.7
dateutil         : 2.8.2
setuptools       : 65.6.3
pip              : 23.1
Cython           : 0.29.33
pytest           : 7.2.0
hypothesis       : None
sphinx           : 6.1.2
blosc            : None
feather          : None
xlsxwriter       : 3.0.6
lxml.etree       : 4.9.2
html5lib         : None
pymysql          : 1.0.3
psycopg2         : 2.9.3
jinja2           : 3.1.2
IPython          : 7.34.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : 1.3.5
brotli           : 
fastparquet      : None
fsspec           : 2022.11.0
gcsfs            : None
matplotlib       : 3.6.2
numba            : 0.56.4
numexpr          : 2.7.3
odfpy            : None
openpyxl         : 3.0.10
pandas_gbq       : None
pyarrow          : 8.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.10.0
snappy           : None
sqlalchemy       : 1.4.46
tables           : 3.7.0
tabulate         : None
xarray           : None
xlrd             : None
zstandard        : None
tzdata           : 2023.3
qtpy             : 2.3.0
pyqt5            : None

</details>

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2023-06-28T03:41:58Z

xref #32265

cmillani · 2023-07-15T23:42:44Z

I just had an issue related to this while using isna on a DF.

On float64 dtype NaN is considered True:

import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [5, 0], 'b': [np.nan, 12]})
df = df.astype('float64')
df['c'] = df['a'] * np.inf
print(df.isna())

Results:

    a      b      c
0  False   True  False
1  False  False   True

Simply changing types:

df = df.astype('Float64')
df['c'] = df['a'] * np.inf
print(df.isna())

Changes the result of the columns C row 1:

     a      b      c
0  False   True  False
1  False  False  False

We solved this issue using numpy.isnan, but since Pandas is allowing Arrow as a backend this approach does not seems good, and I believe that consistency between Float64 and float64 would be a good choice.

I know little about the internals, but would be happy to help with if this is not a desired behavior! If this is desired, it would be nice to document and guide users :)

rachtsingh · 2023-07-27T16:09:06Z

@cmillani

You can fix this in your own codebase at the moment by making the following change:

import numpy as np
from pandas.core.arrays.floating import FloatingArray
FloatingArray.oldisna = FloatingArray.isna
def newisna(self: FloatingArray) -> np.ndarray:
    return np.isnan(self._data) | self._mask.copy()
FloatingArray.isna = newisna

For us this is part of our default pandas monkey patch module that is loaded in all code.

rachtsingh · 2023-07-29T14:46:17Z

For anyone else seeing this, also worth patching fillna:

from pandas.core.arrays.masked import BaseMaskedArrayT, validate_fillna_kwargs, is_array_like, missing
FloatingArray.oldfillna = FloatingArray.fillna
def newfillna(
    self: BaseMaskedArrayT, value=None, method=None, limit=None
) -> BaseMaskedArrayT:
    value, method = validate_fillna_kwargs(value, method)

    mask = (self._mask | np.isnan(self._data)).copy()

    if is_array_like(value):
        if len(value) != len(self):
            raise ValueError(
                f"Length of 'value' does not match. Got ({len(value)}) "
                f" expected {len(self)}"
            )
        value = value[mask]

    if mask.any():
        if method is not None:
            func = missing.get_fill_func(method, ndim=self.ndim)
            npvalues = self._data.copy().T
            new_mask = mask.T
            func(npvalues, limit=limit, mask=new_mask)
            return type(self)(npvalues.T, new_mask.T)
        else:
            # fill with value
            new_values = self.copy()
            new_values[mask] = value
    else:
        new_values = self.copy()
    return new_values
FloatingArray.fillna = newfillna

DeGiorgiMarcello · 2023-12-07T14:48:23Z

Hi, I have the same problem when filling nans generated by a 0/0 division:

In [2]: df = pd.DataFrame({"A":[1.,0,0,pd.NA],"B":[0,0,2,1]}, dtype="Float32")

In [3]: df
Out[3]: 
   A    B
0   1.0  0.0
1   0.0  0.0
2   0.0  2.0
3  <NA>  1.0

In [4]: z = df.A /df.B

In [5]: z
Out[5]: 
0    inf
1    NaN
2    0.0
3   <NA>
dtype: Float32

In [6]: z.fillna(0.)
Out[6]: 
0    inf
1    NaN
2    0.0
3    0.0
dtype: Float32

When filling the nan, the 0/0 nan is not filled.

Casting to float32 and recasting to Float32 solves the problem:

In [14]: z.astype("float32").astype("Float32").fillna(0.)
Out[14]: 
0    inf
1    0.0
2    0.0
3    0.0
dtype: Float32

glaucouri · 2024-01-08T12:55:20Z

Some more info on this behavior: #32265

The v2.0 changelog mention this: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#

Division by zero with ArrowDtype dtypes returns -inf, nan, or inf depending on the numerator, instead of raising (GH 51541)

To unify the nan values I've found another solution:

z.where(z==z, pd.NA)

This replaces all np.nan with the pd.NA values

arobrien · 2024-03-20T08:34:35Z

This has tripped me up and I consider it currently broken with pyarrow backend.

My current workarounds are to use df.astype(float).isna() or np.isnan(df), which give the expected behaviour.

Simple reproduction:

a = pd.DataFrame({'a': [1.0, 0.0, np.nan]})
b = a.convert_dtypes(dtype_backend='pyarrow')

# inconsistent behaviour for 0/0 but consistent for nan/0
display((a / 0).isna())
display((b / 0).isna())

# consistent behaviour 0/0 and nan/0 both yield True.
display(np.isnan(a / 0))
display(np.isnan(b / 0))
```

jayantsahewal added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 27, 2023

lithomas1 added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 9, 2023

jbrockmendel added the PDEP missing values Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint label Oct 26, 2023

DavideCanton mentioned this issue Dec 11, 2023

BUG: Division over ExtensionArrays does not set mask correctly #56451

Closed

3 tasks

phofl mentioned this issue May 13, 2024

BUG: pd.isna handles np.nan inconsistently between numpy and extension / arrow types #58151

Closed

3 tasks

WillAyd mentioned this issue Sep 27, 2024

BUG: isna doesn't detect pyarrow.NA produced by 0/0 #59891

Closed

3 tasks

asishm mentioned this issue Oct 25, 2024

BUG: isna() does not catch np.NaN when datatype is Float64 #60106

Closed

3 tasks

rhshadrach mentioned this issue Oct 28, 2024

BUG: skipna=True operations don't skip NaN in FloatingArrays #59965

Open

3 tasks

rhshadrach mentioned this issue Dec 14, 2024

BUG: The isna function returns False for NaN values in a column of type 'double [pyarrow]'. #60564

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Float64Dtype and isnull() functions unexpected behavior #53887

BUG: Float64Dtype and isnull() functions unexpected behavior #53887

jayantsahewal commented Jun 27, 2023 •

edited

Loading

jbrockmendel commented Jun 28, 2023

cmillani commented Jul 15, 2023

rachtsingh commented Jul 27, 2023

rachtsingh commented Jul 29, 2023

DeGiorgiMarcello commented Dec 7, 2023

glaucouri commented Jan 8, 2024 •

edited

Loading

arobrien commented Mar 20, 2024

BUG: Float64Dtype and isnull() functions unexpected behavior #53887

BUG: Float64Dtype and isnull() functions unexpected behavior #53887

Comments

jayantsahewal commented Jun 27, 2023 • edited Loading

Pandas version checks

Reproducible Example

jbrockmendel commented Jun 28, 2023

cmillani commented Jul 15, 2023

rachtsingh commented Jul 27, 2023

rachtsingh commented Jul 29, 2023

DeGiorgiMarcello commented Dec 7, 2023

glaucouri commented Jan 8, 2024 • edited Loading

arobrien commented Mar 20, 2024

jayantsahewal commented Jun 27, 2023 •

edited

Loading

glaucouri commented Jan 8, 2024 •

edited

Loading