TST: Improved test coverage for Styler.bar error conditions #56341

ccccjone · 2023-12-05T09:50:29Z

closes BUG: .style.bar doesn't work with missing pyarrow values #56283
Added two new unit tests test_bar_color_and_cmap_error_raises() and test_bar_invalid_color_type_error_raises() to improve the coverage of Styler.bar where the issue mentioned, ensuring the method raises the appropriate ValueError.
Implemented a solution using df.min/max before np.nanmin/nanmax to address issues with NumPy's inability to properly handle pd.NA under certain conditions, and added test_styler_bar_with_NA_values() for testing.

mroeschke

Do these tests relate to a different issue? #56283 is to test that this doesn't raise for pyarrow types

ccccjone · 2023-12-06T12:00:00Z

Do these tests relate to a different issue? #56283 is to test that this doesn't raise for pyarrow types

Hi @mroeschke, this is a simple PR that covers the test for the raise in bar() method.

Since you mentioned, I did a deep dive into this issue, and found that this issue does exist on the main(2.1.3). The pyarrow will read the null value as an NAType in array, which may not be resolved properly by the np.nanmin() or np.nanmax() in certain circumstances. You will see this issue as long as you render the result, or call the to_html() method, they all give the same errors.

To address this, I've implemented a method replacing pd.NA with np.nan, and is applied only before executing np.nanmin() or np.nanmax(), to resolve the issue.

Thanks for your review, and I look forward to your response!

attack68 · 2023-12-06T12:25:36Z

pandas/io/formats/style.py

+            return [replace_pd_NA_with_np_nan(element) for element in data_structure]
+        elif isinstance(data_structure, np.ndarray):
+            # Convert numpy array elements recursively
+            return np.array([replace_pd_NA_with_np_nan(element) for element in data_structure])


This looks quite non-performant for big sparse arrays.
Are there other approaches that might be better for this whole process.

Maybe there is a substitute for np.nanmin instead of wrangling the data element-wise to make it fit into the numpy function?

Hi @attack68 , thanks for your advice! I have implemented an alternative - using df.min/max followed by np.nanmin/nanmax. This will resolve the issues caused by numpy cannot properly handle pd.NA in certain circumstances. Here is the explanation for this issue: when import missing value with pyarrow, pd.NA will be generated in DataFrame, then, doing np.nanmin(df.to_numpy()) in _bar(), numpy will throw a TypeError. However, after adding df.min/max, pd.NA will be properly handled and the returned structure will have no pd.NA, so it will be safe to proceed with np.nanmin/nanmax afterwords. This solution will resolve the issue without causing performance issue, although a little bit tricky.

The alternative proposed will likely be much more performant, as well as being much more condensed and easy to read code

attack68

pls see inline comment

mroeschke · 2023-12-06T18:31:37Z

main is not a released version. As of pandas 5e4f2b2 I get

In [1]: import pandas as pd

In [2]: import io
   ...: data = '''name,age,test1,test2,teacher
   ...: Adam,15,95.0,80,Ashby
   ...: Bob,16,81.0,82,Ashby
   ...: Dave,16,89.0,84,Jones
   ...: Fred,15,,88,Jones'''
   ...: scores = pd.read_csv(io.StringIO(data), dtype_backend='pyarrow',
   ...:                     engine='pyarrow'
   ...:                     )
   ...: 
   ...: (scores
   ...: .style.bar(subset='test1')
   ...: )
Out[2]: <pandas.io.formats.style.Styler at 0x1078cdf90>

In [3]: pd.__version__
Out[3]: '2.2.0.dev0+821.g5e4f2b27db'

ccccjone · 2023-12-06T22:16:37Z

main is not a released version. As of pandas 5e4f2b2 I get

In [1]: import pandas as pd

In [2]: import io
   ...: data = '''name,age,test1,test2,teacher
   ...: Adam,15,95.0,80,Ashby
   ...: Bob,16,81.0,82,Ashby
   ...: Dave,16,89.0,84,Jones
   ...: Fred,15,,88,Jones'''
   ...: scores = pd.read_csv(io.StringIO(data), dtype_backend='pyarrow',
   ...:                     engine='pyarrow'
   ...:                     )
   ...: 
   ...: (scores
   ...: .style.bar(subset='test1')
   ...: )
Out[2]: <pandas.io.formats.style.Styler at 0x1078cdf90>

In [3]: pd.__version__
Out[3]: '2.2.0.dev0+821.g5e4f2b27db'

Hi @mroeschke , you may see the error with scores.style.bar(subset='test1').to_html() both in 2.1.3 and in main not release version. Please have a check.

ccccjone · 2023-12-07T02:57:35Z

pre-commit.ci autofix

mroeschke · 2023-12-07T17:36:02Z

pandas/tests/io/formats/style/test_bar.py

+
+
+def test_styler_bar_with_NA_values():
+    df1 = DataFrame({"A": [1, 2, NA, 4]})


Could you test arrow types here per the original issue?

Hi @mroeschke , thank you for your response. I tried to add pyarrow into the test like the origin issue, but encountered a DeprecationWarning when using engine="pyarrow".

I noticed two PRs related to pyarrow issue, #55637 and #55576 , where several tests were marked as xfail due to parsing or type problems with pyarrow.

However, the test can be passed without specifying the engine, and I updated this test in my recent commit, although pyarrow maybe not suitable for this unit test.

def test_style_with_pyarrow_NA_values(): data = """name,age,test1,test2,teacher Adam,15,95.0,80,Ashby Bob,16,81.0,82,Ashby Dave,16,89.0,84,Jones Fred,15,,88,Jones""" df = read_csv(io.StringIO(data), dtype_backend="pyarrow") expected_substring = "style type=" html_output = df.style.bar(subset="test1").to_html() assert expected_substring in html_output

Look forward to your advice. Thank you!

for more information, see https://pre-commit.ci

ccccjone · 2023-12-09T03:04:03Z

Hello @mroeschke @attack68 , I've updated my PR and would appreciate it if you could take a review. Please let me know if there are any further changes needed. If everything looks good, could you assist with merging? Thanks for your time and help!

…test

attack68

You will observe that values is data converted to numpy array.
Since you have replaced this by data.min and data.max to solve the problem of left and right I question whether there is the need to still wrap this within np.nanmin and np.nanmax. Probably not if the pandas function returns a scalar in all cases.

Then the next question is what are the other uses of values. In the code below you will observe the function np.nanmean(values). Does this mean your fix will not work when align="mean"? Or if align is callable?

values is also used when there is a cmap.

Your solution may well work for this case and I believe is still a good improvement, but I do not believe that it fully solves the underlying issue and errors may still result when different arguments are used on this function.

Do you want to attempt these or just propose this to address just the issue in this particular case?

attack68 · 2023-12-09T09:07:30Z

It is possible to create a separate issue for these other identified cases and push two different PRs.

ccccjone · 2023-12-09T10:07:08Z

Hi @attack68 , thanks for your valuable feedback. I have tested with align="mean" and got a similar error.

This PR was initially meant to provide a test for issue #56283 , but during the code analysis, I discovered that np.nanmax and np.nanmin can not work on pd.NA. If we don't iterate through data to convert pd.NA to np.nan, using pd.max and pd.min - which can handle pd.NA - seems to be the most clear and concise approach. However, they return Series or DataFrame, which differ in type from the numpy functions' return values (like numpy.int64). Hence, I used nesting to ensure the return types are consistent after modification.

Regarding your concerns about align and cmap, I am more than willing to address them in separate PRs to avoid overcomplicating this one. We can create a new issue to track it.

Thanks for your help and suggestions!

attack68 · 2023-12-09T10:12:46Z

LGTM.
Please add the new issues before merging this PR. Thx,

ccccjone · 2023-12-09T11:10:17Z

@attack68 I have created the issue #56425 . Please help me merge this one. Thank you very much!

…test

ccccjone · 2023-12-11T19:03:26Z

Hi @mroeschke I'm writing to gently follow up on my recent PR #56341. When you have a moment, could you please take a look or help me merge it? Appreciate your time and assistance.

mroeschke · 2023-12-11T19:06:24Z

Thanks @ccccjone

ccccjone changed the title ~~Improved test coverage for Styler.bar error conditions~~ TST: Improved test coverage for Styler.bar error conditions Dec 5, 2023

mroeschke requested changes Dec 5, 2023

View reviewed changes

ccccjone requested a review from attack68 as a code owner December 6, 2023 11:43

attack68 reviewed Dec 6, 2023

View reviewed changes

attack68 requested changes Dec 6, 2023

View reviewed changes

ccccjone requested review from mroeschke and attack68 December 7, 2023 06:21

mroeschke added the Styler conditional formatting using DataFrame.style label Dec 7, 2023

mroeschke reviewed Dec 7, 2023

View reviewed changes

ccccjone requested a review from mroeschke December 8, 2023 00:26

ccccjone force-pushed the enhence_Styler_bar_test branch from 3c5e782 to e032e9b Compare December 8, 2023 08:12

ccccjone and others added 6 commits December 8, 2023 12:57

Improved test coverage for Styler.bar error conditions

0962c4a

Fixed the code style issue causing test failure

14e7b59

Fixed a Styler.bar bug of missing value

e918b1c

Adopted another tricky way to fix the bug

fdce172

[pre-commit.ci] auto fixes from pre-commit.com hooks

33e6035

for more information, see https://pre-commit.ci

Added a test for Styler.bar with pyarrow

13e976a

ccccjone force-pushed the enhence_Styler_bar_test branch from e032e9b to 13e976a Compare December 8, 2023 21:01

Updated with code style

dc77342

Merge remote-tracking branch 'upstream/main' into enhence_Styler_bar_…

32e1cca

…test

attack68 requested changes Dec 9, 2023

View reviewed changes

attack68 approved these changes Dec 9, 2023

View reviewed changes

ccccjone mentioned this pull request Dec 9, 2023

BUG: Compatibility Issues of Numpy Methods with pd.NA in Styler.bar #56425

Open

ccccjone added 2 commits December 9, 2023 12:32

Merge remote-tracking branch 'upstream/main' into enhence_Styler_bar_…

a70a80a

…test

Merge remote-tracking branch 'upstream/main' into enhence_Styler_bar_…

039fa2d

…test

mroeschke added this to the 2.2 milestone Dec 11, 2023

mroeschke approved these changes Dec 11, 2023

View reviewed changes

mroeschke merged commit d352d5a into pandas-dev:main Dec 11, 2023
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Improved test coverage for Styler.bar error conditions #56341

TST: Improved test coverage for Styler.bar error conditions #56341

ccccjone commented Dec 5, 2023 •

edited

Loading

mroeschke left a comment

ccccjone commented Dec 6, 2023

attack68 Dec 6, 2023 •

edited

Loading

ccccjone Dec 7, 2023

attack68 Dec 7, 2023

attack68 left a comment

mroeschke commented Dec 6, 2023

ccccjone commented Dec 6, 2023

ccccjone commented Dec 7, 2023

mroeschke Dec 7, 2023

ccccjone Dec 8, 2023 •

edited

Loading

ccccjone commented Dec 9, 2023

attack68 left a comment

attack68 commented Dec 9, 2023

ccccjone commented Dec 9, 2023

attack68 commented Dec 9, 2023

ccccjone commented Dec 9, 2023

ccccjone commented Dec 11, 2023

mroeschke commented Dec 11, 2023



		def test_styler_bar_with_NA_values():
		df1 = DataFrame({"A": [1, 2, NA, 4]})

TST: Improved test coverage for Styler.bar error conditions #56341

TST: Improved test coverage for Styler.bar error conditions #56341

Conversation

ccccjone commented Dec 5, 2023 • edited Loading

mroeschke left a comment

Choose a reason for hiding this comment

ccccjone commented Dec 6, 2023

attack68 Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

ccccjone Dec 7, 2023

Choose a reason for hiding this comment

attack68 Dec 7, 2023

Choose a reason for hiding this comment

attack68 left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 6, 2023

ccccjone commented Dec 6, 2023

ccccjone commented Dec 7, 2023

mroeschke Dec 7, 2023

Choose a reason for hiding this comment

ccccjone Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

ccccjone commented Dec 9, 2023

attack68 left a comment

Choose a reason for hiding this comment

attack68 commented Dec 9, 2023

ccccjone commented Dec 9, 2023

attack68 commented Dec 9, 2023

ccccjone commented Dec 9, 2023

ccccjone commented Dec 11, 2023

mroeschke commented Dec 11, 2023

ccccjone commented Dec 5, 2023 •

edited

Loading

attack68 Dec 6, 2023 •

edited

Loading

ccccjone Dec 8, 2023 •

edited

Loading