forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache arrow 14.0.2 hotfix #1
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…mensions, implemented using ExtensionType (apache#37166) ### Rationale for this change For use cases where underlying datatype and number of dimensions in tensors are equal but not the actual shape we want to add a `VariableShapeTensorType`. See apache#24868 and huggingface/datasets#5272 ### What changes are included in this PR? This introduces definition of `arrow.variable_shape_tensor` extension and it's C++ implementation and a Python wrapper. ### Are these changes tested? Yes. ### Are there any user-facing changes? This introduces new extension type to the user. * Closes: apache#24868 Lead-authored-by: Rok Mihevc <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…pache#37901) ### Rationale for this change Python 3.12 will be released in the next couple of weeks. We should add the wheels for pyarrow on our 14.0.0 release. ### What changes are included in this PR? This PR adds jobs to build pyarrow wheels for Python 3.12. ### Are these changes tested? They will be tested via archery tasks ### Are there any user-facing changes? No but users will be able to use pyarrow with Python 3.12 * Closes: apache#37880 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…conan (apache#38202) ### Rationale for this change There is a conflict between the required Zlib version when using both thrift and GRPC. ### What changes are included in this PR? Pinning zlib when using thrifht. ### Are these changes tested? Via archery ### Are there any user-facing changes? No * Closes: apache#38201 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change The NEWS file needs updating for 14.0.0. ### What changes are included in this PR? The NEWS file is updated with commits since 13.0.0. ### Are these changes tested? N/A ### Are there any user-facing changes? No * Closes: apache#38142 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Nic Crane <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…eight default (small) on smaller screens (apache#38148) ### Rationale for this change The Sphinx theme we have been using (PyData Sphinx Theme) has been pinned to an older version for a while now and with the apache#36591 we have updated the code and are now using version 0.14.0 for the dev docs. This PR fixes bugs we have encountered after the PR updating the theme has been merged. ### What changes are included in this PR? - Have default header size for smaller screens and keep it increased for bigger screens. * Closes: apache#38209 Authored-by: AlenkaF <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…#38176) ### Rationale for this change It's an internal bundled library. We should not install it as a part of Arrow. ### What changes are included in this PR? Exclude all Azure SDK for C++ jobs including install jobs aren't executed by default. Building jobs are executed because they are required to build Arrow. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: apache#37510 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…pache#38222) ### Rationale for this change Module caches don't have write permission by owner. So we can remove them by `rm -rf`. ### What changes are included in this PR? Run `go clean -modcache` after all builds. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38200 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change The test-r-versions job is failing because not all of our dependencies support R 3.5. We follow the tidyverse support policy where possible, which means we only support R 3.6 and above. Thus, we can drop the test for R 3.5. ### What changes are included in this PR? R 3.5 was removed from the test matrix for test-r-versions ### Are these changes tested? Yes ### Are there any user-facing changes? No * Closes: apache#38226 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…ncryption tests (apache#38244) * Closes: apache#38243 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
apache#38229) ### Rationale for this change The minimal nightly build are failing with examples that won't run without the dataset feature ### What changes are included in this PR? - Added `examplesIf` where needed - Redocumented ### Are these changes tested? Yes, by all R CMD check jobs ### Are there any user-facing changes? No * Closes: apache#38228 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…r@v2 (apache#38218) ### Rationale for this change CI jobs that used setup-r@ v1 no longer run without error. ### What changes are included in this PR? - Updated the rchk job to use the `setup-r@ v2` - Updated the devdocs job to use `setup-r@ v2`. To make this work, we needed to remove the Windows build because it was installing an old version of R. It seems that the job has been running an outdated and unsable (for most users) for a very long time. ### Are these changes tested? Will be covered by crossbow jobs submitted below. ### Are there any user-facing changes? No. * Closes: apache#38197 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
apache#38232) ### Rationale for this change We have several nightly builds failing with errors building the manual as a result of unicode characters. The unicode characters aren't new, so I'm not sure why this happened now. ### What changes are included in this PR? Install a distribution of latex that supports unicode characters (maybe)? ### Are these changes tested? Yes ### Are there any user-facing changes? No * Closes: apache#38227 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change The latest version of `r/R/install-arrow.R` was not working properly, since it was relying on the `on_rosetta()` function, which is not defined elsewhere. I just fixed the identification of rosetta in the script. With the current code, the following gives an error ````r > source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R") > install_arrow() Error in on_rosetta() : could not find function "on_rosetta" ```` ### What changes are included in this PR? It only removed the `on_rosetta()` function, which was not defined elsewhere, and reverted back to the `rosetta` object to identify if rosetta is present or not on a user's system. ### Are these changes tested? Yes. It was tested with the current code and the proposed PR. The proposed PR works as expected. ### Are there any user-facing changes? No. * Closes: apache#37907 Lead-authored-by: Fernando Mayer <[email protected]> Co-authored-by: Jonathan Keane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change Several PRs over the last few months have update the build system to be more friendly for developers. During this process it has also come to light that we haven't supported the Windows development setup documented here since R 4.1 (released in spring 2021). I had to remove Windows from the test-r-devdocs job because the approach used there was not compatible with the `setup-r@ v2` action, and the job was failing with the `@ v1` action. ### What changes are included in this PR? - Updated the sections on using pre-built static libraries and bundled builds - Removed the Windows section regarding the bundled build. This section would need rewriting to support the last two minor releases of R but in the meantime I think it is mostly confusing. ### Are these changes tested? They are documentation changes. They are also slightly optimisitc: we can fix problems with the developer setup incrementally between releases, but it's more difficult to update our documentation. This PR documents the intended behaviour after apache#38236 . ### Are there any user-facing changes? No. * Closes: apache#37945 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Jacob Wujciak-Jens <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…8195) ### Rationale for this change Previously GCS/S3 support would need to be explicitly enabled in source builds (when they are build without `NOT_CRAN`). As we want the macos binaries to be fully featured we should turn the features on when the dependencies exists. ### What changes are included in this PR? This PR enables this behavior for macOS only, on Linux setting `NOT_CRAN` or `LIBARROW_MINIMAL=false` is still required. ### Are these changes tested? Crossbow and locally (thanks @ paleolimbot ) * Closes: apache#38043 Lead-authored-by: Jacob Wujciak-Jens <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Jonathan Keane <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…rarily (apache#38238) * Closes: apache#38239 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Co-authored-by: Jacob Wujciak-Jens <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…tring and Binary Types in Hash Join (apache#38147) ### Rationale for this change We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The `Slice` function was not calculating their sizes correctly. To fix this, I changed the `Slice` function to calculate the sizes correctly, based on the type of data for large string and binary. * Issue raised: apache#37729 ### What changes are included in this PR? * The `Slice` function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability. * Unit tests (`TEST(KeyColumnArray, SliceBinaryTest)`)for the Slice function have been added. * During random tests for Hash Join (`TEST(HashJoin, Random)`), modifications were made to allow the creation of Large String as key column values. ### Are these changes tested? Yes ### Are there any user-facing changes? Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug. * Closes: apache#38074 Authored-by: Hyunseok Seo <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…egin() where a char pointer is expected (apache#38265) ### Rationale for this change The MSVC compiler doesn't seem to allow user code to assume `std::string_view::const_iterator` is `const char*`, so using only `re2::StringPiece` and preferring to call `.data()` instead of `.begin()` should make things more uniform across different compilers and STL implementations. ### What changes are included in this PR? - Using `re2::StringPiece` instead of `std::string_view` to interact with `re2` - Use `data()` instead of `begin()` where a `char*` is expected ### Are these changes tested? Yes, by existing tests. * Closes: apache#38263 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change We need more disk space... ### What changes are included in this PR? Remove more pre-installed files. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38206 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…e#38225) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.15.0 to 0.17.0. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/golang/net/commit/b225e7ca6dde1ef5a5ae5ce922861bda011cfabd"><code>b225e7c</code></a> http2: limit maximum handler goroutines to MaxConcurrentStreams</li> <li><a href="https://github.com/golang/net/commit/88194ad8ab44a02ea952c169883c3f57db6cf9f4"><code>88194ad</code></a> go.mod: update golang.org/x dependencies</li> <li><a href="https://github.com/golang/net/commit/2b60a61f1e4cf3a5ecded0bd7e77ea168289e6de"><code>2b60a61</code></a> quic: fix several bugs in flow control accounting</li> <li><a href="https://github.com/golang/net/commit/73d82efb96cacc0c378bc150b56675fc191894b9"><code>73d82ef</code></a> quic: handle DATA_BLOCKED frames</li> <li><a href="https://github.com/golang/net/commit/5d5a036a503f8accd748f7453c0162115187be13"><code>5d5a036</code></a> quic: handle streams moving from the data queue to the meta queue</li> <li><a href="https://github.com/golang/net/commit/350aad2603e57013fafb1a9e2089a382fe67dc80"><code>350aad2</code></a> quic: correctly extend peer's flow control window after MAX_DATA</li> <li><a href="https://github.com/golang/net/commit/21814e71db756f39b69fb1a3e06350fa555a79b1"><code>21814e7</code></a> quic: validate connection id transport parameters</li> <li><a href="https://github.com/golang/net/commit/a600b3518eed7a9a4e24380b4b249cb986d9b64d"><code>a600b35</code></a> quic: avoid redundant MAX_DATA updates</li> <li><a href="https://github.com/golang/net/commit/ea633599b58dc6a50d33c7f5438edfaa8bc313df"><code>ea63359</code></a> http2: check stream body is present on read timeout</li> <li><a href="https://github.com/golang/net/commit/ddd8598e5694aa5e966e44573a53e895f6fa5eb2"><code>ddd8598</code></a> quic: version negotiation</li> <li>Additional commits viewable in <a href="https://github.com/golang/net/compare/v0.15.0...v0.17.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.15.0&new-version=0.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@ dependabot rebase` will rebase this PR - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@ dependabot merge` will merge this PR after your CI passes on it - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging - `@ dependabot reopen` will reopen this PR if it is closed - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/arrow/network/alerts). </details> Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Matt Topol <[email protected]>
### Rationale for this change Making sure the documentation that shows up on pkg.go.dev will show that the package is compatible with go1.19+ ### What changes are included in this PR? slight patch/minor version updates of some dependencies along with a documentation update in `doc.go`. * Closes: apache#38285 Authored-by: Matt Topol <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…nature (apache#38283) ### Rationale for this change The type signature of `ReplaceString` should be identical when arrow is compiled with or without `ARROW_WITH_RE2`. ### What changes are included in this PR? The right signature + delegating to the implementation that takes `re2::StringPiece`. The conversion should be a no-op when compiled and optimized. ### Are these changes tested? By existing tests and CI checks. * Closes: apache#38282 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### What changes are included in this PR? Bump versions of Go for our nightly tests to match supported Go versions ### Are these changes tested? Via archery ### Are there any user-facing changes? No Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…images (apache#38287) ### Rationale for this change Fix CI failures for job that is getting out of space. ### What changes are included in this PR? Using our free disk space script to add space for the ubuntu-r-only-r images. ### Are these changes tested? On CI ### Are there any user-facing changes? No * Closes: apache#38286 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change The test fail with the latest version of duckdb (0.9.1). ### What changes are included in this PR? The test was changed so that it did not depend on non-deterministic behaviour. We sort all of the other expectations involving a group_by to avoid this problem...we hadn't changed this one yet because it didn't fail in any previous version of duckdb. ### Are these changes tested? Yes ### Are there any user-facing changes? No * Closes: apache#38293 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…rsions.json (apache#38241) This PR corrects the version for the `version_match` to be equal to the version defined in versions.json. This way the text is correctly displayed in the version switcher button. * Closes: apache#38240 Authored-by: AlenkaF <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…pache#38302) ### Rationale for this change test-r-rhub-ubuntu-gcc-release-latest doesn't have enough disk space. ### What changes are included in this PR? Remove pre-installed files on Azure Pipelines too. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38295 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change Verify JDK 21 in CI in time for the Arrow v14 release. ### What changes are included in this PR? * Bump latest Java version from 20 -> 21 in CI ### Are these changes tested? Yes, via CI. ### Are there any user-facing changes? No. * Closes: apache#36994 Authored-by: Dane Pitkin <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
…he#38303) ### Rationale for this change `expr` was printing the number of matching chars which showed up as noise in the log (which we want to avoid as much as possible to avoid any false positive checks) See apache#38236 (comment) for @ jonkeane's investigation. ### What changes are included in this PR? Replace use of expr with test. ### Are these changes tested? Crossbow Lead-authored-by: Jacob Wujciak-Jens <[email protected]> Co-authored-by: Jonathan Keane <[email protected]> Signed-off-by: Jonathan Keane <[email protected]>
…the sidebar TOC (apache#38313) * Closes: apache#38312 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
### Rationale for this change Update news.md ### Are these changes tested? no * Closes: apache#38904 Authored-by: Jacob Wujciak-Jens <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…sible (apache#38362) ### Rationale for this change We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures. ### What changes are included in this PR? Use local test data if they exist. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38345 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change It's better that we always use the latest Homebrew to check with the latest Homebrew that are used by most users. But it's difficult to maintain. ### What changes are included in this PR? We don't update Homebrew manually. GitHub hosted GitHub Actions Runners update Homebrew periodically. We depend on it instead of manual `brew update`. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#39003 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…itly created sub-directories (apache#38845) ### Rationale for this change See apache#38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object. ### Are these changes tested? I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup. ### Are there any user-facing changes? Fixes the regression * Closes: apache#38618 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
### Rationale for this change The script was to quiet. ### What changes are included in this PR? Fix regex and add some output: ``` Rscript tools/update-checksums.R 14.0.0 1 ✘ [1] "Extracting libarrow binary paths from tasks.yml" [1] "Downloading windows/arrow-14.0.0.zip.sha512" [1] "Converting windows/arrow-14.0.0.zip to windows style line endings" [1] "Downloading linux-openssl-1.0/arrow-14.0.0.zip.sha512" [1] "Downloading linux-openssl-1.1/arrow-14.0.0.zip.sha512" [1] "Downloading linux-openssl-3.0/arrow-14.0.0.zip.sha512" [1] "Downloading darwin-arm64-openssl-1.1/arrow-14.0.0.zip.sha512" [1] "Downloading darwin-arm64-openssl-3.0/arrow-14.0.0.zip.sha512" [1] "Downloading darwin-x86_64-openssl-1.1/arrow-14.0.0.zip.sha512" [1] "Downloading darwin-x86_64-openssl-3.0/arrow-14.0.0.zip.sha512" [1] "Checksums updated successfully!" ``` ### Are these changes tested? locally ### Are there any user-facing changes? no * Closes: apache#39041 Authored-by: Jacob Wujciak-Jens <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…pache#39077) ### Rationale for this change Running our test suite results in many spurious warnings being printed that make it difficult to spot actual warnings. ### What changes are included in this PR? The data used for specific tests involving `summarise()` was updated to not trigger the warnings. ### Are these changes tested? Yes ### Are there any user-facing changes? No * Closes: apache#39076 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…rification job on AlmaLinux 8 (apache#39073) ### Rationale for this change The verification task for Almalinux 8 was failing. ### What changes are included in this PR? Add required python3.11-devel to the Docker image. ### Are these changes tested? Yes via archery task. ### Are there any user-facing changes? No * Closes: apache#39072 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…pache#39082) ### Rationale for this change `KEYS` may have UTF-8 (non ASCII) characters. Ruby chooses the default encoding based on `LANG`. If `LANG=C`, Ruby uses the `US-ASCII` encoding as the default encoding. If Ruby uses the `US-ASCII` encoding, we can't process `KEYS` because it has non ASCII characters. ### What changes are included in this PR? Use the `UTF-8` encoding explicitly for `KEYS`. If we specify the `UTF-8` encoding explicitly, our `KEYS` processing don't depend on `LANG`. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#39074 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…pache#38450) ### Rationale for this change On macOS, "cp -a source/ destination/" copies "source/*" to "destination/" (such as "source/a" is copied to "destination/a") not "source/" to "destination/" (such as "source/a" is copied to "destination/source/a"). ### What changes are included in this PR? We need to remove the trailing "/" from "source/" to copy "source/" itself to "destination/source/". ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#38449 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
k-anshul
pushed a commit
that referenced
this pull request
Dec 21, 2024
…n timezone (apache#45051) ### Rationale for this change If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception. This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash. Here is a backtrace excerpt: ``` apache#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 apache#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 apache#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116 apache#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 apache#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 ``` ### What changes are included in this PR? Catch C++ exceptions when iterating ORC batches instead of letting them slip through. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40633 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes from these PRs
apache#42003
apache#41638