ENH: df.compare should have tolerances #48488

tehunter · 2022-09-09T19:46:34Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

I would like to have df.compare accept "tolerance" thresholds to allow for approximate comparisons. This feature already exists in the assert_frame_equal utility, and it would be beneficial in compare to help users identify the rows and columns that are causing their assertion to fail. It would be helpful in many cases to allow users to filter out differences that are sufficiently small.

Feature Description

    def compare(
        self,
        other: DataFrame,
        align_axis: Axis = 1,
        keep_shape: bool = False,
        keep_equal: bool = False,
        rtol = None,
        atol = None,
    ) -> DataFrame:
    """
    ...
    rtol: float | None, default None
        Relative tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.

    atol: float | None, default None
        Absolute tolerance. Numeric differences below this value will not be considered differences for the purposes of "keep_shape" and will be shown as NaN if "keep_equal" is False.

For implementation, the current comparison is essentially the following check: mask = ~((self == other) | (self.isna() & other.isna())). From a quick glance of _testing.assert_almost_equal, it appears we could implement it by calling that function iteratively with each item of the DataFrame, though I'm not sure if it's okay to reference the _testing library outside of testing functions.

Alternative Solutions

Could also be implemented more directly with math.isclose function calls, but this would need to be applied only to numeric columns.

Additional Context

No response

The text was updated successfully, but these errors were encountered:

CompRhys · 2023-10-27T00:24:51Z

BUMP

tehunter added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 9, 2022

it176131 mentioned this issue Nov 3, 2022

ENH: Parameter keep_equal in .compare(...) method should not rely on the keep_shape parameter #49510

Open

3 tasks

aeisenbarth mentioned this issue Jan 8, 2024

ENH: Improve assertion message for assert_frame_equal #39967

Open

mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: df.compare should have tolerances #48488

ENH: df.compare should have tolerances #48488

tehunter commented Sep 9, 2022

CompRhys commented Oct 27, 2023

ENH: df.compare should have tolerances #48488

ENH: df.compare should have tolerances #48488

Comments

tehunter commented Sep 9, 2022

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

CompRhys commented Oct 27, 2023