BUG: replace na_rep for pd.NA values in `to_string` #54959

rsm-23 · 2023-09-02T12:47:00Z

xref BUG: na_rep not respected in to_* methods with pd.NA #54872
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…ep-bug

rsm-23 · 2023-09-02T12:49:57Z

@mroeschke Need some help here. This test will fail pandas/tests/arrays/string_/test_string.py::test_repr. Should I update the test or will that break some behavior?

rsm-23 · 2023-09-02T14:02:24Z

@mroeschke The documentation says, that the default value for na_rep is 'NaN' so in pandas/io/formats/format.py, the replace never happens.

    def _format(x):
            if self.na_rep is not None and is_scalar(x) and isna(x):
                try:
                    # try block for np.isnat specifically
                    # determine na_rep if x is None or NaT-like
                    if x is None:
                        return "None"
                    elif x is NA:
                        return str(NA)
                    elif x is NaT or np.isnat(x):
                        return "NaT"
                except (TypeError, ValueError):
                    # np.isnat only handles datetime or timedelta objects
                    pass
                return self.na_rep

and if I replace elif x is NA: return self.na_rep then it breaks tests because the expectation is to get <NA>

mroeschke · 2023-09-05T18:29:05Z

Which tests need fixing?

rsm-23 · 2023-09-06T06:27:55Z

@mroeschke Some tests that are failing are below. This is not an exhaustive list. The expected value in these tests is <NA> but according to docs it should be NaN since that's the default value for na_rep. Somehow this behaviour is not in place.

pandas/tests/arrays/boolean/test_repr.py::test_repr
pandas/tests/arrays/floating/test_repr.py::test_frame_repr[Float32Dtype]
pandas/tests/arrays/integer/test_repr.py::test_frame_repr[Int8Dtype]
pandas/tests/arrays/string_/test_string.py::test_repr[pyarrow]
pandas/tests/io/formats/test_to_string.py::test_nullable_float_to_string[Float32]

rsm-23 · 2023-09-07T15:03:50Z

@mroeschke pinging for visibility. Please let me know if the above message was clear or not.

rsm-23 · 2023-09-12T09:37:37Z

@mroeschke need some inputs here :)

mroeschke · 2023-09-18T16:54:59Z

Yeah I think the tests should be updated for this change

rsm-23 · 2023-09-19T10:52:16Z

@mroeschke can you please review this PR?

rsm-23 · 2023-09-25T19:54:36Z

@mroeschke can you please review this? I have updated the test cases.

rsm-23 · 2023-09-28T09:17:27Z

@mroeschke pinging on green.

rsm-23 · 2023-10-10T05:12:55Z

@mroeschke @MarcoGorelli can one of you please review this?

MarcoGorelli · 2023-10-10T07:42:01Z

Thanks for your PR!

@mroeschke @MarcoGorelli can one of you please review this?

Just as a heads up, I'm not familiar with strings - I can take a look, but I'm afraid I'll need to prioritise other things first

mroeschke

Sorry for the delay on my end

mroeschke · 2023-10-10T16:04:18Z

pandas/core/frame.py

@@ -7340,8 +7340,8 @@ def value_counts(
        >>> df
          first_name middle_name
        0       John       Smith
-        1       Anne        <NA>


All the usages that were changed should still use <NA> rather than NaN. This should only change if na_rep was specified as something else

The expected value in these tests is <NA> but according to docs it should be NaN since that's the default value for na_rep. Somehow this behaviour is not in place.

@mroeschke

I think since the current behavior has been the case for a few releases now, the default of na_rep="NaN" should be changed to na_rep=lib.no_default so that NaN, NA, and None can continue to use their reprs here

@mroeschke the problem is if I set na_rep=lib.no_default then the string representation will not contain either of NaN, NA or None. I have tried setting the default as <NA> and reverted most of the test cases, however, there's issue in a couple of them.

Using na_rep=lib.no_default would mean making NaN, NA and None return their string defaults like

if self.na_rep is lib.no_default: ...existing logic else: ...use na_rap

mroeschke · 2023-11-07T00:55:23Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

rsm-23 added 2 commits September 2, 2023 12:04

replace na_rep for pd.NA values

3cc2518

Merge branch 'main' of https://github.com/pandas-dev/pandas into na_r…

ea1c723

…ep-bug

Merge branch 'main' into na_rep-bug

af532ba

rsm-23 added 4 commits September 18, 2023 17:32

Merge remote-tracking branch 'upstream/main' into na_rep-bug

4073623

Merge remote-tracking branch 'upstream/main' into na_rep-bug

1ef99eb

Fixed unit tests for na_rep

4b7dd63

fixed docstrings

17bed8a

Merge branch 'main' into na_rep-bug

c28ab1a

rsm-23 added 5 commits September 29, 2023 13:21

Merge branch 'main' into na_rep-bug

2dc5535

Merge branch 'main' into na_rep-bug

f2a42a7

Merge branch 'main' into na_rep-bug

21edb19

Merge branch 'main' into na_rep-bug

875f714

Merge branch 'main' into na_rep-bug

fb356b9

MarcoGorelli self-requested a review October 10, 2023 07:42

mroeschke requested changes Oct 10, 2023

View reviewed changes

rsm-23 requested a review from mroeschke October 11, 2023 08:25

rsm-23 added 2 commits October 11, 2023 13:56

Merge branch 'main' into na_rep-bug

2304fb7

Merge branch 'main' into na_rep-bug

44ba528

rsm-23 added 4 commits October 16, 2023 09:26

reverted tests and updated default value

4213cd8

removed print

6e2028c

reverted tests and changed default value for na_rep

7087110

Merge remote-tracking branch 'upstream/main' into na_rep-bug

7aad289

mroeschke closed this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: replace na_rep for pd.NA values in `to_string` #54959

BUG: replace na_rep for pd.NA values in `to_string` #54959

rsm-23 commented Sep 2, 2023 •

edited

Loading

rsm-23 commented Sep 2, 2023

rsm-23 commented Sep 2, 2023 •

edited

Loading

mroeschke commented Sep 5, 2023

rsm-23 commented Sep 6, 2023 •

edited

Loading

rsm-23 commented Sep 7, 2023

rsm-23 commented Sep 12, 2023

mroeschke commented Sep 18, 2023

rsm-23 commented Sep 19, 2023

rsm-23 commented Sep 25, 2023

rsm-23 commented Sep 28, 2023

rsm-23 commented Oct 10, 2023

MarcoGorelli commented Oct 10, 2023

mroeschke left a comment

mroeschke Oct 10, 2023

rsm-23 Oct 11, 2023 •

edited

Loading

mroeschke Oct 11, 2023

rsm-23 Oct 16, 2023

mroeschke Oct 16, 2023

mroeschke commented Nov 7, 2023

BUG: replace na_rep for pd.NA values in to_string #54959

BUG: replace na_rep for pd.NA values in to_string #54959

Conversation

rsm-23 commented Sep 2, 2023 • edited Loading

rsm-23 commented Sep 2, 2023

rsm-23 commented Sep 2, 2023 • edited Loading

mroeschke commented Sep 5, 2023

rsm-23 commented Sep 6, 2023 • edited Loading

rsm-23 commented Sep 7, 2023

rsm-23 commented Sep 12, 2023

mroeschke commented Sep 18, 2023

rsm-23 commented Sep 19, 2023

rsm-23 commented Sep 25, 2023

rsm-23 commented Sep 28, 2023

rsm-23 commented Oct 10, 2023

MarcoGorelli commented Oct 10, 2023

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke Oct 10, 2023

Choose a reason for hiding this comment

rsm-23 Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke Oct 11, 2023

Choose a reason for hiding this comment

rsm-23 Oct 16, 2023

Choose a reason for hiding this comment

mroeschke Oct 16, 2023

Choose a reason for hiding this comment

mroeschke commented Nov 7, 2023

BUG: replace na_rep for pd.NA values in `to_string` #54959

BUG: replace na_rep for pd.NA values in `to_string` #54959

rsm-23 commented Sep 2, 2023 •

edited

Loading

rsm-23 commented Sep 2, 2023 •

edited

Loading

rsm-23 commented Sep 6, 2023 •

edited

Loading

rsm-23 Oct 11, 2023 •

edited

Loading