Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Set check_exact to true if dtype is int #55934

Merged
merged 9 commits into from
Nov 27, 2023

Conversation

parthi-siva
Copy link
Contributor

@parthi-siva parthi-siva commented Nov 13, 2023

@parthi-siva
Copy link
Contributor Author

parthi-siva commented Nov 13, 2023

@MarcoGorelli not sure where to add test cases for the change.

Edit: Got it (pandas/tests/util/test_assert_frame_equal.py)

@MarcoGorelli MarcoGorelli self-requested a review November 13, 2023 09:07
@parthi-siva parthi-siva marked this pull request as draft November 13, 2023 09:30
- If int dtype is different we are ignoring the difference
- so added check to set check_exact to true only when dtype is same
@parthi-siva parthi-siva marked this pull request as ready for review November 13, 2023 09:46
@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Nov 13, 2023
@MarcoGorelli
Copy link
Member

Well I'm relieved that CI is still green 😄 Can you update the docs for this method to note that rtol, atol and check_exact don't take effect for int dtype?

Regarding the check - not totally sure about the check, will take a closer look next week

@parthi-siva
Copy link
Contributor Author

parthi-siva commented Nov 19, 2023

Sure. Updated the documentation. Added note under check_exact. Since we are setting it to True in case of int dtype. Also, documentation for rtol and atol states it is applicable only when check_exact is False so I assumed that it is sufficient to add note under check_exact

@MarcoGorelli
Copy link
Member

I think it should always be exact, except for floats?

You should also check for extensions arrays, and include a whatsnew

I've added a commit anyway, hope it's ok - do the changes look alright to you?

@parthi-siva
Copy link
Contributor Author

Yeah, make sense.

Some testcases are failing in some builds. Is that alright?

@MarcoGorelli
Copy link
Member

Looks like another bug was hiding because of the check_exact default 😄 #56136

For now let's xfail the relevant test

@MarcoGorelli MarcoGorelli added this to the 2.2 milestone Nov 27, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - although I did add a commit, so leaving open a bit in case others have objections

@parthi-siva
Copy link
Contributor Author

Sure Marco

@mroeschke mroeschke merged commit 17eec96 into pandas-dev:main Nov 27, 2023
42 checks passed
@mroeschke
Copy link
Member

Thanks @parthi-siva and @MarcoGorelli

Comment on lines 572 to +578
data[1] = 1
result = Series(data, index=index)
expected = Series([0, 1, 2], index=index, dtype=int)
tm.assert_series_equal(result, expected)
with pytest.raises(AssertionError, match="Series classes are different"):
# TODO should this be raising at all?
# https://github.com/pandas-dev/pandas/issues/56131
tm.assert_series_equal(result, expected)
Copy link
Member

@jorisvandenbossche jorisvandenbossche Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different issue than reported in #56131 (there is also no check_dtype=False here)

This doesn't involve different data types (it's all numpy dtypes), and the two objects created here seemingly are the same:

data = np.ma.masked_all((3,), dtype=int)
data[0] = 0
data[1] = 1
data[2] = 2
index = ["a", "b", "c"]
result = Series(data, index=index)
expected = Series([0, 1, 2], index=index, dtype=int)
In [29]: result
Out[29]: 
a    0
b    1
c    2
dtype: int64

In [30]: expected
Out[30]: 
a    0
b    1
c    2
dtype: int64

In [31]: result.dtype == expected.dtype
Out[31]: True

But apparently we have a bug in the Series constructor that preserves the masked array as underlying value if it has no masked elements:

In [32]: result.values
Out[32]: 
masked_array(data=[0, 1, 2],
             mask=[False, False, False],
       fill_value=999999)

In [33]: expected.values
Out[33]: array([0, 1, 2])

That seems like a separate, actual bug we should solve, regardless of the behaviour of check_dtype in assert_series_equal (although being more strict here actually uncovered this bug ..)

@parthi-siva
Copy link
Contributor Author

Looks like another bug was hiding because of the check_exact default 😄 #56136

For now let's xfail the relevant test

@jorisvandenbossche yeah.. there was another issues because of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: assert_series_equal not raising on unequal series?
4 participants