Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Upgrade or provide ETA on dill #22893

Open
eddie-scio opened this issue Aug 25, 2022 · 17 comments
Open

[Feature Request]: Upgrade or provide ETA on dill #22893

eddie-scio opened this issue Aug 25, 2022 · 17 comments

Comments

@eddie-scio
Copy link

What would you like to happen?

2.41.0 has dill<0.3.2, which is now more than 2 years old. We're trying to upgrade one of our core ML libraries that we run on Beam, but we are blocked on a dill>=0.3.4 requirement there. I do see the note about dill incompat in setup.py, and I do see a rollback PR for the upgrade -- I'm wondering if there is any ETA or communication with dill that can be shared.

Issue Priority

Priority: 2

Issue Component

Component: sdk-py-core

@tvalentyn
Copy link
Contributor

@ryanthompson591 could you provide your input here please?

@eddie-scio
Copy link
Author

Hey @ryanthompson591 , just a friendly ping!

@ryanthompson591
Copy link
Contributor

There are two efforts underway. The first was to upgrade our dill version, which got blocked by some failed tests.

See:
uqfoundation/dill#517
python/cpython#94245

In dill's defense it's a python error - that gets resolved in 3.10.

The second is to vendor dill, which there is some agreement to do.

Either would solve this problem.

As to the question of when, I am not yet sure. Can you give some context as to why allen NLP requires this version of dill? Maybe it can help us prioritize.

@tvalentyn
Copy link
Contributor

In dill's defense it's a python error - that gets resolved in 3.10.

Is this fixed on dill master?

@ryanthompson591
Copy link
Contributor

I just downloaded dill master, and tried this code out and it is fixed.

Dill added a workaround for the python error here:
https://github.com/uqfoundation/dill/blob/25a7e450ed76c7a0820834a3a91134476b1b8253/dill/_dill.py#L746d.

@eddie-scio
Copy link
Author

That's awesome. So we just need dill to release 0.3.6 (https://github.com/uqfoundation/dill/milestones), and then release a new version of Beam that changes the dependency constraint (e.g.dill>=0.3.6)?

@eddie-scio
Copy link
Author

Good news, dill just released 0.3.6 (https://pypi.org/project/dill/0.3.6/) two days ago, so in 2.42.1 we should be able to take care of this?

@ryanthompson591
Copy link
Contributor

Thanks for this eddie. I will be looking into doing this import and testing it throughout our code bases. I was able to validate that the new version of dill did fix this problem.

@eddie-scio
Copy link
Author

Thanks so much, Ryan! Really excited to get this working.

@eddie-scio eddie-scio mentioned this issue Nov 4, 2022
4 tasks
@jmelot
Copy link

jmelot commented Apr 10, 2023

Just chiming in to say this is blocking us deploying a pipeline that runs an mlflow model (we have to use mlflow==1.24.0 and it has a dill==0.3.5.1 requirement). Please let us know if there's anything we can do to help!

@tvalentyn
Copy link
Contributor

tvalentyn commented Apr 10, 2023

Thanks all, here is an update.

  • Dill has made several breaking changes between dill version 0.3.1.1 and dill version 0.3.6, which affect what gets pickled and how: https://github.com/uqfoundation/dill/issues?q=is%3Aissue+regression
  • Switching Apache Beam to newer version of dill will very likely negatively affect some group of users, while it may be a non-issue for some other group of users.
  • Dill has also made changes to internal code, which Beam had some assumptions about. Beam 2.47.0 will include code changes to be able to work with dill==0.3.6: Fix compatibility with dill 0.3.6. #26086, however the default and required version of dill still remains dill==0.3.1.1 for now.
  • dill==0.3.1.1. doesn't support Python 3.11. For Beam, one of the primary motivation to upgrade dill is to support Python 3.11, in addition to concerns in this bug. To unblock Python 3.11 support we went with monkey-patching dill 0.3.1.1 at runtime. The patch is applied only if dill version is 0.3.1.1 and Python version 3.11 or higher: 3d0ee7b (#26121) . The alternative we have considered is to vendor dill. I decided against vendoring at this time at last minute because: dill makes changes to the global state, for example modifies global dispatch table used by standard pickler. Given the demand for this issue it is clear that newer versions of dill will be installed in addition to vendored version. The vendored version and a stock version installed at runtime may potentially modify the global state differently. I didn't have enough time before 2.47.0 release cut to properly evaluate and address a risk of such concurrent modification.
  • I have also evaluated setting cloudpickle as default pickler, and have encountered one issue that warranted additional investigation: [Bug]: cloudpickle appears to incorrectly unpickle cloned combiners  #26209 .
  • I plan to have a conversation with dill maintainers to see if we can mitigate the impact of the breaking changes they have introduced to be able to update smoothly to a newer version of dill. Update: [Discuss] Recent breaking changes in new dill releases  uqfoundation/dill#589

In the meantime, with Beam 2.47.0, users can try to update to newer version of dill if they must, even though beam requires dill 0.3.1.1. Users can force-install newer version of dill in their submission environment as long as they install the same version of dill in the runtime environment. As I mentioned above, some users may not be affected by dill's breaking changes while some other users may be. Dill's breaking changes are not something Beam controls, but as mentioned above, Beam did make code changes to work with newer versions of dill. I will also continue to work on a better solution for this issue.

@jacek-jablonski
Copy link

Hello,
any updates on this? The old dil is still present in beam version 2.51.0.

@tvalentyn
Copy link
Contributor

No update yet but we have prioritzed resolving blockers for cloudpickle adoption. In the meantime, the recommendations above still apply. Specifically

You can force-install a different version of dill than required by Beam under the following conditions:

  • You install the same version in submission and in the runtime environment.
  • The chosen version works for your pipeline

See also: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#make-the-launch-environment-compatible-with-the-runtime-environment

helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this issue Jan 3, 2024
…data#26243)

## Description of the change
This upgrades us to Ubuntu Mantic Minotaur, the latest release that does
not reproduce psf/requests#6560

Additionally, it moves us on to Python 3.11 as the deadsnakes PPA is not
available for non-LTS releases.

`pylint` has been removed from our virtualenv as `apache-beam` has a
strict version band requirement on `dill` . See
apache/beam#22893

This PR includes lots of whitespace changes caused by Black due to the
upgrade of `astroid`.
## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues
Closes Recidiviz/recidiviz-data#20615
Closes Recidiviz/recidiviz-data#24244
Closes Recidiviz/recidiviz-data#4287
Related to Recidiviz/recidiviz-data#12469

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [x] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [ ] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [ ] This pull request has a descriptive title and information useful
to a reviewer
- [ ] Potential security implications or infrastructural changes have
been considered, if relevant

---------

Co-authored-by: Helper Bot <[email protected]>
GitOrigin-RevId: 2a71759c1bd091be8e383ae45e7314ee0ad42a27
@siddharthab
Copy link

The suggestion to force a version of dill may not work in many situations, and so a proper remediation will be very much appreciated.

Many people are using pip-tools as their dependency manager these days, which does not allow a user to force install a specific version. Build systems like Bazel rules for Python also rely on pip-tools to build the dependency tree with resolved versions. I was unable to override the version constraints from my dependencies in my devops setup.

Environment compatibility checks between submission and runtime environments are actually supposed to be part of distributed compute systems. Dask and newer versions of Beam SDK have them. Should we not rely on these checks and remove the narrow version constraints in the requirements? If a user wants to use a provided (instead of custom) Beam docker image, then maybe each image can come with suggested job submission environment setup scripts.

@tvalentyn
Copy link
Contributor

If a package manager prevents you from force-installing a different version of dill in your environment, you could create a custom distribution of beam or other package that uses dill, and change the version constraints for dill in setup.py to your liking and at your own risk. Then, install the package from a local copy instead of from pypi.

The suggested version of dill is still 0.3.1.1, because we are not confident that newer versions of dill won't break someone if we upgrade it across the board. Switching to cloudpickle is still the plan.

@siddharthab
Copy link

@tvalentyn Thank you for your suggestion. I think a forked source archive of beam could be a reasonable fallback. I also filed an issue for updating numpy - #30098.

@poffdeluxe
Copy link

Now that #26086 has been merged in, can we expect an upgrade to BEAM's dill dep soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants