Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: replace na_rep for pd.NA values in to_string #54959

Closed
wants to merge 19 commits into from

Conversation

rsm-23
Copy link
Contributor

@rsm-23 rsm-23 commented Sep 2, 2023

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 2, 2023

@mroeschke Need some help here. This test will fail pandas/tests/arrays/string_/test_string.py::test_repr. Should I update the test or will that break some behavior?

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 2, 2023

@mroeschke The documentation says, that the default value for na_rep is 'NaN' so in pandas/io/formats/format.py, the replace never happens.

    def _format(x):
            if self.na_rep is not None and is_scalar(x) and isna(x):
                try:
                    # try block for np.isnat specifically
                    # determine na_rep if x is None or NaT-like
                    if x is None:
                        return "None"
                    elif x is NA:
                        return str(NA)
                    elif x is NaT or np.isnat(x):
                        return "NaT"
                except (TypeError, ValueError):
                    # np.isnat only handles datetime or timedelta objects
                    pass
                return self.na_rep

and if I replace elif x is NA: return self.na_rep then it breaks tests because the expectation is to get <NA>

@mroeschke
Copy link
Member

Which tests need fixing?

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 6, 2023

@mroeschke Some tests that are failing are below. This is not an exhaustive list. The expected value in these tests is <NA> but according to docs it should be NaN since that's the default value for na_rep. Somehow this behaviour is not in place.

  • pandas/tests/arrays/boolean/test_repr.py::test_repr
  • pandas/tests/arrays/floating/test_repr.py::test_frame_repr[Float32Dtype]
  • pandas/tests/arrays/integer/test_repr.py::test_frame_repr[Int8Dtype]
  • pandas/tests/arrays/string_/test_string.py::test_repr[pyarrow]
  • pandas/tests/io/formats/test_to_string.py::test_nullable_float_to_string[Float32]

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 7, 2023

@mroeschke pinging for visibility. Please let me know if the above message was clear or not.

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 12, 2023

@mroeschke need some inputs here :)

@mroeschke
Copy link
Member

Yeah I think the tests should be updated for this change

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 19, 2023

@mroeschke can you please review this PR?

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 25, 2023

@mroeschke can you please review this? I have updated the test cases.

@rsm-23
Copy link
Contributor Author

rsm-23 commented Sep 28, 2023

@mroeschke pinging on green.

@rsm-23
Copy link
Contributor Author

rsm-23 commented Oct 10, 2023

@mroeschke @MarcoGorelli can one of you please review this?

@MarcoGorelli
Copy link
Member

Thanks for your PR!

@mroeschke @MarcoGorelli can one of you please review this?

Just as a heads up, I'm not familiar with strings - I can take a look, but I'm afraid I'll need to prioritise other things first

@MarcoGorelli MarcoGorelli self-requested a review October 10, 2023 07:42
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on my end

@@ -7340,8 +7340,8 @@ def value_counts(
>>> df
first_name middle_name
0 John Smith
1 Anne <NA>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the usages that were changed should still use <NA> rather than NaN. This should only change if na_rep was specified as something else

Copy link
Contributor Author

@rsm-23 rsm-23 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected value in these tests is <NA> but according to docs it should be NaN since that's the default value for na_rep. Somehow this behaviour is not in place.

@mroeschke

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since the current behavior has been the case for a few releases now, the default of na_rep="NaN" should be changed to na_rep=lib.no_default so that NaN, NA, and None can continue to use their reprs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke the problem is if I set na_rep=lib.no_default then the string representation will not contain either of NaN, NA or None. I have tried setting the default as <NA> and reverted most of the test cases, however, there's issue in a couple of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using na_rep=lib.no_default would mean making NaN, NA and None return their string defaults like

if self.na_rep is lib.no_default:
    ...existing logic
else:
    ...use na_rap

@rsm-23 rsm-23 requested a review from mroeschke October 11, 2023 08:25
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants