Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: BUG: Fix Incorrect Logical Operation Between Pandas DataFrame and Series (#60204) #60278

Conversation

Lavishgangwani
Copy link

This PR fixes issue #60204, where logical operations (&, |, ^) between a DataFrame and a Series were not correctly broadcasting, leading to unexpected results. The issue was observed when attempting operations like (df >= 10) | (df['A'] >= 10), where df is a DataFrame and df['A'] is a Series.

Changes Made

  • Adjusted logical_op function: Added logic to handle cases where left is a DataFrame and right is a Series. The function now converts right to a NumPy array to allow proper broadcasting.
  • Type Check for List-like Objects: Added error handling to raise a TypeError if right is a list-like object (e.g., list, tuple) without a defined dtype, as these are unsupported in logical operations with DataFrames.
  • Enhanced Boolean Casting with fill_bool: Modified fill_bool to ensure proper casting to boolean, accommodating non-boolean types and handling missing values correctly.

Code Adjustments

  • Imports and Compatibility: Ensured correct imports for DataFrame and Series types, applied lib.item_from_zerodim for handling zero-dimensional right objects.
  • Broadcasting Conversion: Converted right to right.values when it's a Series, enabling proper broadcasting across DataFrame columns in logical operations.

Testing and Validation

  • New Unit Tests: Added specific tests in pandas/tests/arithmetic/test_ops.py to verify:
    • Correct Broadcasting: Tested logical operations between DataFrame and Series to ensure correct broadcasting behavior.
    • Error Handling: Confirmed TypeError is raised for unsupported list-like objects without dtype.
    • Boolean Casting with NaNs: Verified the behavior when logical operations involve NaNs or non-boolean compatible types.

Additional Notes

  • Backward Compatibility: This fix is backward-compatible and applies only to scenarios involving DataFrame and Series logical operations.
  • Minimal Performance Impact: The change is isolated to broadcasting adjustments, so any performance impact should be minimal.

Checklist

  • Code changes are consistent with Pandas guidelines.
  • New tests have been added and pass successfully.
  • Documentation and comments are added where necessary.
  • Changes are backward-compatible.

Related Issue

Closes #60204

Thank you for reviewing this PR! Please let me know if further adjustments are needed.


@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Incorrect logical operation between pandas dataframe and series
2 participants