Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.testing.assert_series_equal break in version 2.2.0rc0 #56646

Closed
2 of 3 tasks
shobsi opened this issue Dec 27, 2023 · 7 comments · Fixed by #56724
Closed
2 of 3 tasks

BUG: pd.testing.assert_series_equal break in version 2.2.0rc0 #56646

shobsi opened this issue Dec 27, 2023 · 7 comments · Fixed by #56724
Labels
Blocker Blocking issue or pull request for an upcoming release Bug Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Milestone

Comments

@shobsi
Copy link

shobsi commented Dec 27, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> pd.__version__
'2.2.0rc0'
>>> left = pd.Series([81, 18, 121, 38, 74, 72, 81, 81, 146, 81, 81, 170, 74, 74])
>>> right = pd.Series([72,  9, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72])
>>> pd.testing.assert_series_equal(left, right, rtol=1.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/google/home/shobs/code/bigframes/.nox/system_prerelease/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 951, in assert_series_equal
    assert_numpy_array_equal(
  File "/usr/local/google/home/shobs/code/bigframes/.nox/system_prerelease/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 688, in assert_numpy_array_equal
    _raise(left, right, err_msg)
  File "/usr/local/google/home/shobs/code/bigframes/.nox/system_prerelease/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 682, in _raise
    raise_assert_detail(obj, msg, left, right, index_values=index_values)
  File "/usr/local/google/home/shobs/code/bigframes/.nox/system_prerelease/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 612, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Series are different

Series values are different (92.85714 %)
[index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[left]:  [81, 18, 121, 38, 74, 72, 81, 81, 146, 81, 81, 170, 74, 74]
[right]: [72, 9, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72]

Issue Description

bigframes has prerelease tests that test integration with pandas by installing the latest pandas from conda forge:

pip install --extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple --prefer-binary --pre --upgrade pandas

The tests are failing since yesterday (12/26/2023).

Expected Behavior

With pandas version 2.1.4 it is passing

>>> import pandas as pd
>>> pd.__version__
'2.1.4'
>>> left = pd.Series([81, 18, 121, 38, 74, 72, 81, 81, 146, 81, 81, 170, 74, 74])
>>> right = pd.Series([72,  9, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72])
>>> pd.testing.assert_series_equal(left, right, rtol=1.5)
>>>

Installed Versions

INSTALLED VERSIONS

commit : d4c8d82
python : 3.11.4.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.13-1rodete1-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1rodete1 (2023-12-06)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0rc0
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 69.0.2
pip : 23.3.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.19.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.6.0
gcsfs : 2023.6.0
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : 0.20.0
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.23
tables : None
tabulate : 0.9.0
xarray : 2023.12.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@shobsi shobsi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 27, 2023
@mroeschke
Copy link
Member

cc @MarcoGorelli was it intentional that the tol arguments would be ignored along with check_exact?

@asishm
Copy link
Contributor

asishm commented Dec 28, 2023

looks intentional - #55934 (comment)

@MarcoGorelli
Copy link
Member

thanks @shobsi for trying out the release candidate

yup, that's right, as @asishm says, it was intentional

I think this can be worked around faily easily in users' code, I'd rather have the testing code be as simple as possible

@rhshadrach rhshadrach added the Testing pandas testing functions or related to the test suite label Dec 28, 2023
@phofl
Copy link
Member

phofl commented Dec 28, 2023

Hm there are actual use cases for this kind of behaviour if your are dealing with very large numbers and exact precision isn't super important

Defaulting check-exact to True seems fine but ignoring other arguments seems suspect

We definitely have to document this better if we decide that we want to keep this kind of behaviour

@lithomas1 lithomas1 added Regression Functionality that used to work in a prior pandas version Blocker Blocking issue or pull request for an upcoming release and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 28, 2023
@lithomas1 lithomas1 added this to the 2.2 milestone Dec 28, 2023
@lithomas1
Copy link
Member

Yeah, I don't think we should silently ignore here - we should raise if we don't plan on supporting this.

@lithomas1
Copy link
Member

@MarcoGorelli

Are you fine with reverting the change to ignore check_exact/rtol/atol arguments for int dtypes (and just have the default be True for integer dtypes)?

I don't think ignoring simplifies the testing code - I haven't worked with the testing code internals before, but I'm pretty sure ints/floats go through the same path in assert_almost_equalin pandas/_libs/testing.pyx anyways.

@MarcoGorelli
Copy link
Member

to be honest I think the current behaviour is nice, and that inexact checking for integers is a rare-enough use-case that people can just implement it manually (like, assert (result - expected).abs().max() < a_tol)

no strong opinion though, so long as the integer case defaults to exact checking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Bug Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants