Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: is_string_dtype(pd.Index([], dtype='O')) returns False #54997

Merged
merged 11 commits into from
Oct 4, 2023

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Sep 4, 2023

Corrected the definition of is_all_strings. Now for object arrays, which have no items is_string_dtype returns True instead of False.

@natmokval natmokval marked this pull request as ready for review September 7, 2023 12:21
@natmokval natmokval added Bug Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data labels Sep 7, 2023
@natmokval
Copy link
Contributor Author

@jbrockmendel, could you please take a look at this PR?



def test_empty_object_array_is_string_dtype():
# GH #54661
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can you remove the space after GH. i like the pattern GH#12345 bc it is something i grep for

Copy link
Contributor Author

@natmokval natmokval Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. I agree, it's better to have all GH refers look alike

@@ -1212,3 +1212,8 @@ def test_multi_column_dtype_assignment():

df["b"] = 0
tm.assert_frame_equal(df, expected)


def test_empty_object_array_is_string_dtype():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm looks like we have tests for is_string_dtype in a couple of places. one of these days might be worth checking if there is a reason for that and if not, refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. I think, I found a better place for my test: pandas/tests/dtypes/test_common.py. I combined my test with another one and added a parameterization.

@@ -169,6 +169,7 @@ Bug fixes
~~~~~~~~~
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`pandas.core.window.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
- Bug in :func:`is_all_strings` while checking object array with no elements is of the string dtype (:issue:`54661`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bug in is_all_strings or is_string_dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the bug is in is_all_strings, but we don’ t have this function in api and users see the bug while calling is_string_dtype. That is why I replaced Bug in :func:is_all_strings with: Bug in :func:is_string_dtype.

Looks like we don’t have a test for is_all_strings, do we need one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the user-facing bug. in the docs we generally only reference public functions/methods

Copy link
Contributor Author

@natmokval natmokval Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thank you. I removed from doc/source/whatsnew/v2.2.0.rst my line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel, I updated the PR. Could you please take a look at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we still want a note, it just needs to refer to is_string_dtype, which I think is public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, is_string_dtype is a public function, I added the note to v2.1.1.rst. I think my change is a minor and we can put it in the version 2.1.1

@@ -34,6 +34,7 @@ Fixed regressions
Bug fixes
~~~~~~~~~
- Fixed bug for :class:`ArrowDtype` raising ``NotImplementedError`` for fixed-size list (:issue:`55000`)
- Fixed bug in :func:`is_string_dtype` while checking object array with no elements is of the string dtype (:issue:`54661`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need this to be

:func:`pandas.api.types.is_string_dtype`

and to move this to 2.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I corrected the note in whatsnew and moved it to 2.2.0

@natmokval
Copy link
Contributor Author

@jbrockmendel, I corrected my PR, could you please take a look at the update?

@jbrockmendel
Copy link
Member

LGTM. pls merge main and ping on green

@natmokval
Copy link
Contributor Author

LGTM. pls merge main and ping on green

thanks, I merged main, ci - green

@jbrockmendel jbrockmendel merged commit 06cdcc0 into pandas-dev:main Oct 4, 2023
33 checks passed
@jbrockmendel
Copy link
Member

thanks @natmokval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: is_string_dtype(pd.Index([], dtype='O')) returns False
3 participants