-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Set check_exact to true if dtype is int #55934
Conversation
@MarcoGorelli not sure where to add test cases for the change. Edit: Got it (pandas/tests/util/test_assert_frame_equal.py) |
- If int dtype is different we are ignoring the difference - so added check to set check_exact to true only when dtype is same
Well I'm relieved that CI is still green 😄 Can you update the docs for this method to note that Regarding the check - not totally sure about the check, will take a closer look next week |
Sure. Updated the documentation. Added note under |
0231547
to
6d74ec3
Compare
I think it should always be exact, except for floats? You should also check for extensions arrays, and include a whatsnew I've added a commit anyway, hope it's ok - do the changes look alright to you? |
Yeah, make sense. Some testcases are failing in some builds. Is that alright? |
Looks like another bug was hiding because of the check_exact default 😄 #56136 For now let's xfail the relevant test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - although I did add a commit, so leaving open a bit in case others have objections
Sure Marco |
Co-authored-by: Matthew Roeschke <[email protected]>
Thanks @parthi-siva and @MarcoGorelli |
data[1] = 1 | ||
result = Series(data, index=index) | ||
expected = Series([0, 1, 2], index=index, dtype=int) | ||
tm.assert_series_equal(result, expected) | ||
with pytest.raises(AssertionError, match="Series classes are different"): | ||
# TODO should this be raising at all? | ||
# https://github.com/pandas-dev/pandas/issues/56131 | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different issue than reported in #56131 (there is also no check_dtype=False
here)
This doesn't involve different data types (it's all numpy dtypes), and the two objects created here seemingly are the same:
data = np.ma.masked_all((3,), dtype=int)
data[0] = 0
data[1] = 1
data[2] = 2
index = ["a", "b", "c"]
result = Series(data, index=index)
expected = Series([0, 1, 2], index=index, dtype=int)
In [29]: result
Out[29]:
a 0
b 1
c 2
dtype: int64
In [30]: expected
Out[30]:
a 0
b 1
c 2
dtype: int64
In [31]: result.dtype == expected.dtype
Out[31]: True
But apparently we have a bug in the Series constructor that preserves the masked array as underlying value if it has no masked elements:
In [32]: result.values
Out[32]:
masked_array(data=[0, 1, 2],
mask=[False, False, False],
fill_value=999999)
In [33]: expected.values
Out[33]: array([0, 1, 2])
That seems like a separate, actual bug we should solve, regardless of the behaviour of check_dtype
in assert_series_equal (although being more strict here actually uncovered this bug ..)
@jorisvandenbossche yeah.. there was another issues because of this |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.