Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed bug where pd.NA was being cast to NaN during formatting #55754

Merged
merged 13 commits into from
Nov 7, 2023

Conversation

dominiquegarmier
Copy link
Contributor

@dominiquegarmier dominiquegarmier commented Oct 29, 2023

replaced pd.concat with DataFrame.drop to copy pd.NA more faithfully (see linked issue)

pandas/io/formats/format.py Outdated Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, can you add a test.

passed new test locally
@dominiquegarmier
Copy link
Contributor Author

im not sure, is it normal that so many CI steps are failing? (it's my first time contributing to pandas)

@rhshadrach
Copy link
Member

rhshadrach commented Oct 31, 2023

im not sure, is it normal that so many CI steps are failing? (it's my first time contributing to pandas)

Two things to do when the CI is failing: look at the failure message of (some) builds and see if the builds are passing on main. For the first, I'm seeing:

https://github.com/pandas-dev/pandas/actions/runs/6703704571/job/18214736331?pr=55754#step:4:226

INTERNALERROR> E ModuleNotFoundError: No module named 'numba'

This is likely not due to your PR 😄 Recent commits to main also have 4 failing.

Once you believe the failed builds are not due to your PR and you're not seeing them fail on the most recent commit to main, try merging main into this branch. It will run all the builds again.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Comment on lines 701 to 703
self.tr_frame = self.tr_frame.iloc[
chain(range(row_num), range(_len - row_num, _len))
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is definitely a lot better. I think we can do a little better by using NumPy instead.

self.tr_frame = self.tr_frame.iloc[
    np.hstack([np.arange(row_num), np.arange(_len - row_num, _len)])
]

@@ -171,6 +171,15 @@ def test_repr_truncation(self):
with option_context("display.max_colwidth", max_len + 2):
assert "..." not in repr(df)

def test_repr_truncation_preserves_na(self):
# https://github.com/pandas-dev/pandas/issues/55630
with option_context("display.max_rows", 10):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set this to 2 instead of 10, I think we could then test for the entire string.

@dominiquegarmier
Copy link
Contributor Author

it looks like main is having the same CI failures as here

@rhshadrach
Copy link
Member

Looks good - I think CI issues will be resolved when you merge main.

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string labels Nov 3, 2023
@mroeschke mroeschke added this to the 2.2 milestone Nov 7, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM merge when ready @rhshadrach

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added the Bug label Nov 7, 2023
@rhshadrach rhshadrach merged commit bf37560 into pandas-dev:main Nov 7, 2023
37 of 38 checks passed
@rhshadrach
Copy link
Member

Thanks @dominiquegarmier!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: visual bug: pd.NA not printed properly in REPL when DataFrame gets collapsed
4 participants