-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40633: [C++][Python] Fix ORC crash when file contains unknown timezone #45051
Conversation
Our ORC adapter would let a C++ exception slip through instead of converting it into a Status error.
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
See also: |
@@ -140,6 +143,33 @@ def test_example_using_json(filename, datadir): | |||
check_example_file(path, table, need_fix=True) | |||
|
|||
|
|||
def test_unknown_timezone(datadir): | |||
# Example file relies on the timezone "US/Pacific". It should gracefully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this test checks an unrelated issue.
@github-actions crossbow submit wheelcp312* |
@wgtmac I don't think the ORC adapter code is completely safe yet. But this is a step forward (and it adds tests for it). |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit wheelcp312* |
Revision: 1251958 Submitted crossbow builds: ursacomputing/crossbow @ actions-cb6caa5686 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! Sorry that I forgot to do this.
Rationale for this change
If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.
This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main
tzdata
package to atzdata-legacy
package. Iftzdata-legacy
is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.Here is a backtrace excerpt:
What changes are included in this PR?
Catch C++ exceptions when iterating ORC batches instead of letting them slip through.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.