Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: apache-beam is unusable in recent python due to pinning an old dill library from 2019 #32842

Open
1 task done
morotti opened this issue Oct 17, 2024 · 2 comments
Open
1 task done
Assignees

Comments

@morotti
Copy link

morotti commented Oct 17, 2024

What happened?

Hello,

apache-beam cannot be installed on any recent python environment because it is pinning an old version of dill from 2019.

pip install apache-beam>=2.57.0
...
The conflict is caused by:
    apache-beam 2.59.0 depends on dill<0.3.2 and >=0.3.1.1

I have noticed apache-beam 2.57.0+ is required to allow pyarrow 15+, which is required by other recent tools/libraries.

It is impossible to install apache-beam on a recent python environments because all releases of apache-beam are pinning dill==0.3.1.1, which conflicts with other packages.
dill 0.3.1.1 was released in September 2019, it's extremely old. the latest python version at the time was python 3.7.
for reference the dill package did not provide official python wheel packages before v0.3.4 in June 2021. It needs custom compilation to be used.
https://pypi.org/project/dill/#history

image

Could you please remove the pinning of dill?
Correct to dill>=0.3.1.1 in this file
https://github.com/apache/beam/blame/master/sdks/python/setup.py#L348

The old comment is incorrect by the way. It was an early release 6 years ago when that comment was written. The serialization has stabilized since then.

          # Dill doesn't have forwards-compatibility guarantees within minor
          # version. Pickles created with a new version of dill may not unpickle
          # using older version of dill. It is best to use the same version of
          # dill on client and server, therefore list of allowed versions is
          # very narrow. See: https://github.com/uqfoundation/dill/issues/341.
          'dill>=0.3.1.1,<0.3.2',

Regards.

Issue Priority

Priority: 1 (data loss / total loss of function)

I am picking priority 1 total loss of function rating for the ticket, as being unable to install and use apache-beam is a total loss of function.

Issue Components

  • Component: Python SDK
@liferoad
Copy link
Collaborator

liferoad commented Oct 18, 2024

You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .

@chebbyChefNEQ
Copy link

chebbyChefNEQ commented Dec 6, 2024

You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .

Hi, with newer tools like uv and poetry it is often not possible to ignore these conflicts. This causes quite a few undesirable behaviors. e.g.

when depending on apache-beam and multiprocess at the same time, the dill pin implies multiprocess<=0.70.9, which is ancient and sdist only. Building the sdist i not possible as the dependency resolution in setuptool craps out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants