-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Python 3.12 in-compatibility of Apache Beam #32617
Comments
from the histrory (e.g. #21898) looks like upgrade dill is not trivial. |
Can we help somehow with that ? |
Feel free to take the issue. And we are also working on improving cloudpickle (e.g., #26209) and we hope we can make cloudpickle default in the future. |
Unfortunately I know very little of Beam's dill usage, I might attempt to move it up but I am not sure if leading PR by me will be more help than a burden :). But I might try if you think it is a good idea. |
It is a non-trivial change and I don't recommend that route, we won't be able to merge such change. We can try to monkey-patch dill 0.3.1.1 on Python 3.12. Beam has this change:
Upgrading cloudpickle to next major version should be doable, we might get to it soon, but not before next release. We are finally making some progress to switch to cloudpickle. |
We are using apache-beam 2.59.0 for those tests (latest released) and from what I see "@dill.register(CodeType)" is part of it. But I think it explains exactly what happens. The problem in this case that apache-beam has required dependency and limits the dill version to 0.3.1.1 - and even if beam itself monkey-patches it in their code, it does not mean that any other user of dill with the same virtualenv will make use of that patching. In our case - we have single venv where our users potentially install multiple providers - beam being one of them. Which means that any other provider (or airlfow core) will have dill 0.3.1.1 installed as forced by Beam. But if the task that you run does not use beam, it will never import apache.beam provider code an apache-beam package, so dill will not be monkey-patched. In case of the failing build here: https://github.com/apache/airflow/actions/runs/11121136124/job/30899938977?pr=41990 - you can see that it's not "beam" tests that fail, those are "PythonVirtualenv" tests that fail (and this happens only for Python 3.12 and only when apache-beam is installed, which forces downgrading of dill from 0.3.9 to 0.3.1.1. Previously those test pass successfull when dill 0.3.9 is installed (and no apache-beam is installed). So monkey-patching possibly solves beam usage of dill, but dragging dill down to 0.3.1.1 makes other packages that do not do similar monkey-patching in the same environment fail. |
As mentioned in uqfoundation/dill#589, we could not upgrade to newer versions of Dill because something fundamental has changed in their serialization algorithms; Given a signal we see in internal mono-repo codebase, it is very likely that some Beam users will be broken by the upgrade in external codebase. We could upgrade to dill to a Python 3.12-compatible dill==0.3.1.2 (pun not intended) that includes the patch for Python 3.12 and no other changes, if dill maintainers (cc: @mmckerns) could create one; otherwise our current plan is to switch Beam to use cloudpickle. Unfortunately a "dill<=0.3.1.2" will still cause versions conflicts for packages that explicitly depend on newer version of dill, but perhaps at least the tests will be passing.. |
OK. Then for the time being excluding Apache Beam provider for 3.12 is the only option for us - unless Beam will make dill an optional depenndency, with cloudpickle being the "default" one. In such case we could bring Beam back to the supported providers for 3.12. |
BTW. I think really having dill as "required" dependency is the main source of problem. Just being able to work without it and having another option (i.e. cloudpickle) would be enough to unblock it. We could even add release notes to Beam Provider that it only works with cloudpickle for Python 3.12. |
What happened?
I would like to report that Python 3.12 support for Apache Beam is a bit broken due to Python SDK depending on old version of dill (and cloudpickle as well but that's not likely a blocker)
Currently in Apache Airlfow, the beam provider is disabled for Python 3.12, because adding Apache Beam with it's dependencies made it impossible to have non-conflicting dependencies. After the last release of Apache Beam (2.59.0) - I was hoping all the problems with Python 3.12 were solved, and attempted to rebase the PR bringing back Beam provider to Python 3.12, but - unfortunately our tests had shown that there is one more conflict left.
You can see a failing build here https://github.com/apache/airflow/actions/runs/11121136124/job/30899938977?pr=41990
and PR to bring beam back is apache/airflow#42505.
The failing tests are not beam tests - there are tests that test "dill" serialization for Airflow Python Virtualenv Operator and the error is this:
The analysis of the issue shown that the problem is with the dill version Apache Beam expects is not compatible with Python 3.12 and produces this error. Before re-enabling Beam for Python 3.12, the tests were passing on Python 3.12 and dill version used was 0.3.9, but apache beam has very strict requirement for dill version.
This is what happen when we add Apache Beam to Python 3.12 environment:
And it's caused by this limitation:
Also cloudpickle is downgraded to 2.2.1 due to this limitation:
But cloudpickle is not as problematic as dill is in this case - simply because the old version of dill does not properly support Python 3.12.
It would be great if the next release of Apache Beam bumps at least dill to latest version (and possibly cloudpickle) - as this would allow finally to make Apache Beam provider in Airflow to have Python 3.12 support.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: