-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support newer versions of Pyarrow in Beam. #31305
Conversation
…e compat suites for pyarrow to reduce test suite runtime.
Assigning reviewers. If you would like to opt out of this review, comment R: @jrmccluskey for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Postcommit dependency suite might time out on this PR, but will watch the signal on https://github.com/apache/beam/actions/runs/9101961896, which runs against a branch on main repo, and should reflect the yml change. I inspected the logs manually on https://github.com/apache/beam/actions/runs/9100383893/job/25015301762?pr=31305 and pyarrow portion of tests succeded, the suite timed out after 120 min |
Coverage passed: |
toxTask "testPy38pyarrow-4", "py38-pyarrow-4", "${posargs}" | ||
test.dependsOn "testPy38pyarrow-4" | ||
postCommitPyDep.dependsOn "testPy38pyarrow-4" | ||
toxTask "testPy38pyarrow-9", "py38-pyarrow-9", "${posargs}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to test against every minor version? Could we get away with something like testing against the oldest and newest versions we support? I'm unfamiliar with how much pyarrow changes between minor releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we look at compatibility between beam and pyarrow, I think the value of testing each individual pyarrow version diminishes overtime. it might be useful to do a one-time test when doing a large upgrade like this one, then test newer versions as they are released. Note that the "newest supported" version is also tested in regular precommit suites.
A more thorough test combination might be warranted if we are worried about interoperability of some dependencies, like pandas and pyarrow. This might be why Beam has added special compat testing for these two dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod the clarifying question
We need to upgrade pyarrow for some unit test to pass on Python 3.12 (#29149)