-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] fix fillna function on a single column fail #32594
Conversation
This reverts commit 2a5736c.
Assigning reviewers. If you would like to opt out of this review, comment R: @shunping for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Thanks for fixing this bug, @DKER2 . Could you also add a unit test (something the following) in to your PR? import tempfile
import apache_beam as beam
from apache_beam.dataframe import convert
from apache_beam.testing.util import assert_that
from apache_beam.testing.util import equal_to
with tempfile.NamedTemporaryFile(mode="w") as fp:
fp.writelines(("col1,col2\n", "1,1\n", "NaN,1\n", "-1,1\n"))
fp.flush()
pipeline = beam.Pipeline(None)
beam_df = pipeline | 'Read CSV' >> beam.dataframe.io.read_csv(fp.name)
beam_df['col1'] = beam_df['col1'].fillna(0)
assert_that(
convert.to_pcollection(beam_df),
equal_to([(1,1), (0, 1),(-1, 1)])) |
Could you please check the name of the test and confirm if the file where I've placed it is correct? |
@@ -354,6 +354,16 @@ def test_rename(self): | |||
0: 2, 2: 0 | |||
}, errors='raise')) | |||
|
|||
def test_dataframe_column_fillna_constant_as_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would place it together with other fillna tests:
def test_fillna_columns(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to put in that file like this commit , but seem that this function
def expr_to_stages(expr): |
is not triggered, the test triggered different flow. And the test pass even with the old version of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help to advise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
The bug is only triggered in the transform of DeferredDataFrame, so you have to put that in a pipeline so that when the pipeline runs the corresponding transform code path will run.
Could you put a test like the following into frames_test.py?
import numpy as np
import apache_beam as beam
from apache_beam.dataframe import convert
from apache_beam.testing.util import assert_that
from apache_beam.testing.util import equal_to
with beam.Pipeline(None) as p:
pcoll = (p | beam.Create([1.0, np.nan, -1.0]) | beam.Select(x=lambda x: x))
df = convert.to_dataframe(pcoll)
df_new = df['x'].fillna(0)
assert_that(
convert.to_pcollection(df_new),
equal_to([1.0, 0.0, -1.0]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @shunping for your suggestion. That works as well
Reminder, please take a look at this pr: @shunping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the test? Thanks!
Added Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
R: @damccorm for final approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* fix bug all arg add as inputs * fix bug for fillna * Revert "fix bug for fillna" This reverts commit 2a5736c. * fix bug for fillna * add test for fillna a column * add test for fillna a column * add test for fillna a column * revert add test to frames_test * Move test from transforms to frames
fixes #31855
In fillna to expression fuction in here, it add
ConstantExpression
as args to fillna() if value is constant. This would cause problem in expression to compute stage function as in this line all args will be added as inputs to compute stage. This constant expression will latter be computed by this function, but as it is a constant expression, it do not have any input that will make the program fail. We only should addComputedExpression
andPlaceHolderExpression
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.