Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: df.compare should have tolerances #48488

Open
1 of 3 tasks
tehunter opened this issue Sep 9, 2022 · 1 comment
Open
1 of 3 tasks

ENH: df.compare should have tolerances #48488

tehunter opened this issue Sep 9, 2022 · 1 comment

Comments

@tehunter
Copy link
Contributor

tehunter commented Sep 9, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I would like to have df.compare accept "tolerance" thresholds to allow for approximate comparisons. This feature already exists in the assert_frame_equal utility, and it would be beneficial in compare to help users identify the rows and columns that are causing their assertion to fail. It would be helpful in many cases to allow users to filter out differences that are sufficiently small.

Feature Description

    def compare(
        self,
        other: DataFrame,
        align_axis: Axis = 1,
        keep_shape: bool = False,
        keep_equal: bool = False,
        rtol = None,
        atol = None,
    ) -> DataFrame:
    """
    ...
    rtol: float | None, default None
        Relative tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.

    atol: float | None, default None
        Absolute tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.

For implementation, the current comparison is essentially the following check: mask = ~((self == other) | (self.isna() & other.isna())). From a quick glance of _testing.assert_almost_equal, it appears we could implement it by calling that function iteratively with each item of the DataFrame, though I'm not sure if it's okay to reference the _testing library outside of testing functions.

Alternative Solutions

Could also be implemented more directly with math.isclose function calls, but this would need to be applied only to numeric columns.

Additional Context

No response

@CompRhys
Copy link

BUMP

@mroeschke mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants