Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix pd.read_html handling of rowspan in table header #60464

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

snitish
Copy link
Contributor

@snitish snitish commented Dec 2, 2024

From the original thread:

s = '<table><tr><th rowspan="2">A</th><th>B</th></tr><tr><td>1</td></tr><tr><td>C</td><td>2</td></tr></table>'
buf = io.StringIO(s)
print(pd.read_html(buf)[0])
#    A                  B
#    A Unnamed: 1_level_1
# 0  1                NaN

# Expected:
#    A  B
# 0  A  1
# 1  C  2

The bug is due to rowspan > 1 in the header row which leads to overflow into the body rows. Current logic does not handle this case. I fix it by overflowing the partial rows from the header into the body (and similarly from body to footer if any).

@snitish snitish mentioned this pull request Dec 2, 2024
3 tasks
pandas/io/html.py Outdated Show resolved Hide resolved
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you add a whatsnew note in v3.0.0.rst under the I/O section?

@mroeschke mroeschke added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Dec 2, 2024
@snitish snitish requested a review from mroeschke December 2, 2024 18:47
@rhshadrach rhshadrach added the Bug label Dec 2, 2024
@mroeschke mroeschke added this to the 3.0 milestone Dec 3, 2024
@mroeschke mroeschke merged commit d9dfaa9 into pandas-dev:main Dec 3, 2024
51 of 55 checks passed
@mroeschke
Copy link
Member

Thanks @snitish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: rowspan in read_html failed
3 participants