-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: error_bad_lines and warn_bad_lines for read_csv #40413
Conversation
Hello @lithomas1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-05-28 00:23:35 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, am +1 on this change, some comments
pandas/io/parsers/readers.py
Outdated
@@ -382,6 +402,7 @@ | |||
"memory_map": False, | |||
"error_bad_lines": True, | |||
"warn_bad_lines": True, | |||
"on_bad_lines": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah should remove error/warn_bad_lines from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_options_with_defaults is really spaghetti-fied right now, so removing this would not the args not passed to the parser. I will try to clean up _get_options_with_defaults in a future PR if I have time.
cc @gfyoung if you'd have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to set the original options to None to avoid way more complexity.
pandas/io/parsers/base_parser.py
Outdated
else: | ||
raise ValueError(f"Argument {on_bad_lines} is invalid for on_bad_lines") | ||
else: | ||
if kwds.get("error_bad_lines"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need a deprecation warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_deprecated_defaults handles this for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i am not seeing it. but it seems that the old options (error_bad_lines / warn_bad_lines) are still passed around. these should be handled in exactly 1 place, then immediately discarded (with a possible warning / error if multiple things are specified), and then only on_bad_lines should exist.
pandas/io/parsers/base_parser.py
Outdated
self.on_bad_lines = self.BadLineHandleMethod.SKIP | ||
else: | ||
raise ValueError(f"Argument {on_bad_lines} is invalid for on_bad_lines") | ||
# Override on_bad_lines w/ deprecated args for backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't all of these cases show a deprecation warning? L227-238
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like, I said before this is handled by _deprecated_defaults. Ideally _deprecated_defaults wouldn't exist, but as we all know, read_csv is very bloated in terms of API, so its nice to show the deprecation warnings from one place. If you are still unsure, this is tested in this PR here https://github.com/pandas-dev/pandas/pull/40413/files#diff-2f09043040b80b8e52cb8525b43b1653de0ce7d04e0999ca8e0dbe69b05b88faR758-R770
thanks @lithomas1 this was a bear! thanks for sticking with it |
updated |
… from `read_csv` & enabling more tests ### What changes were proposed in this pull request? This PR proposes to remove `squeeze` parameter from `read_csv` to follow the behavior of latest pandas. See pandas-dev/pandas#40413 and pandas-dev/pandas#43427 for detail. This PR also enables more tests for pandas 2.0.0 and above. ### Why are the changes needed? To follow the behavior of latest pandas, and increase the test coverage. ### Does this PR introduce _any_ user-facing change? `squeeze` will be no longer available from `read_csv`. Otherwise, it's test-only. ### How was this patch tested? Enabling & updating the existing tests. Closes #42551 from itholic/pandas_remaining_tests. Authored-by: itholic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
… from `read_csv` & enabling more tests ### What changes were proposed in this pull request? This PR proposes to remove `squeeze` parameter from `read_csv` to follow the behavior of latest pandas. See pandas-dev/pandas#40413 and pandas-dev/pandas#43427 for detail. This PR also enables more tests for pandas 2.0.0 and above. ### Why are the changes needed? To follow the behavior of latest pandas, and increase the test coverage. ### Does this PR introduce _any_ user-facing change? `squeeze` will be no longer available from `read_csv`. Otherwise, it's test-only. ### How was this patch tested? Enabling & updating the existing tests. Closes apache#42551 from itholic/pandas_remaining_tests. Authored-by: itholic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
… from `read_csv` & enabling more tests ### What changes were proposed in this pull request? This PR proposes to remove `squeeze` parameter from `read_csv` to follow the behavior of latest pandas. See pandas-dev/pandas#40413 and pandas-dev/pandas#43427 for detail. This PR also enables more tests for pandas 2.0.0 and above. ### Why are the changes needed? To follow the behavior of latest pandas, and increase the test coverage. ### Does this PR introduce _any_ user-facing change? `squeeze` will be no longer available from `read_csv`. Otherwise, it's test-only. ### How was this patch tested? Enabling & updating the existing tests. Closes apache#42551 from itholic/pandas_remaining_tests. Authored-by: itholic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
Summary of contents
- Adds new on_bad_lines parameter (I found on_bad_lines a more explicit name than bad_lines)
- This defaults to None at first in order to preserve compatibility, however it should be changed to error in 2.0 after
error_bad_lines and warn_bad_lines are removed.
- Cleanup of some C/Python Parser code ( add enum instead of using 2 variables for C and only use on_bad_lines in Python)