-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert backjumping #142
Revert backjumping #142
Conversation
58bf803
to
9da4077
Compare
Why not fallback to no jumping only when a valid state isn't found? |
In the examples I've tested backjumping doesn't always directly cause a valid state not to be found, but rather can skip over a valid state that was critical to resolution earlier in the backtracking process. Therefore a fallback would have to start over from the begining, causing significant performance issues for genuine resolution impossible errors. |
To be clearer, here are two sets of requirements, tested on Linux Python 3.10 across three versions of Pip, 23.3.1, 23.3.1 with speculative fix, and 23.3.1 with backjumping removed:
I'm fairly confident also I can come up with at least one more example of each behavior above (I could in the past but as PyPi is not a static target I need to play around adding upper bounds). This is the diff of my without backjump branch: notatallshaw/pip@23.3.1...23.2.1-without-backjump And this is the diff of my speculative fix backjump, it is refactored to be a simpler version of #134 (comment): notatallshaw/pip@23.3.1...23.3.1-with-specultive-fix |
I am not sure if we can come out with a better solution on backjumping or we are going to revert this. |
I woke up this morning with an idea of how to implement a backjump fallback that falls back as minimally as possible: #144 It could still make valid resolution impossibles slower to find, but I think it's a good compromise on keeping the significant real world benefits of backjumping but still falling back to a correct solution. |
Please keep in mind that the code without backjumping not only fails to resolve dependencies in certain cases but also takes hours (if not days on larger projects) to fail, as it scales quadratically with the number of dependencies and their available versions. How about making backjumping optional via configuration, and let users decide if they prefer performance over correctness? |
Can you provide the concrete examples for @notatallshaw to evaluate in the new patch? |
Hi @frostming , I misunderstood a comment above. I guess the old code does in principle resolve correctly every time, but in practice it doesn't manage to complete as we have timeouts for its execution. |
I agree that there are requirements that are currently impractical without backjumping. However, these could potentially resolved by different optimizations. But it's tricky to assess their effectiveness when resolvelib can currently produce an incorrect result. I'm not sure if you saw already but I produced an alternative PR that falls back to backtracking under certain circumstances. If there isn't an outright fix this a possible way forward if resolvelib maintainers agree. |
@bennati I would still appreciate if you have real world examples of requirements that don't resolve in a reasonable amount of time or reach I have created a new optimization for resolvelib / Pip called "prefer conflicting causes", and it would be good to know if there are cases where backjumping is needed even this optimization. I have created two branches, one with backjump and one without, to test these scenarios: |
Hi @notatallshaw , it's easy to get such requirement by starting with these dependencies
These took on my machine around 7 mins to resolve (using python-inspector 0.9.5). Then add as many dependencies as you like after those, for example these dependencies took 15 minutes to resolve;
For comparison, installing these dependencies with pip took 1 min. |
My initial results are showing that adding prefering causing conflicts makes backjumping optimizations small but measurable: Requirements: boto3==1.10.16 s3fs seaborn Requirements: boto3==1.10.16 s3fs seaborn pandas scipy Requirements: sphinx sphinx-rtd-theme sphinx-toolbox myst-parser sphinxcontrib-bibtex nbsphinx But also that the backjumping fallback can, at least in some cases, add a small overhead compared to removing backjumping altogether, I will take a look if there's a way to improve this: Requirements: kedro[test]==0.18.13 My methodology is running I plan to test more requirements, especially ones that use https://github.com/pradyunsg/pip-resolver-benchmarks to confirm these |
In the last few days I have been working more seriously on making sure my "Prefer conflicting causes" optimization is working as expected: pypa/pip#12459 With the branch in that PR the incorrect behavior of I have an intuitive explanation of why I think this PR would help avoid this backjumping failure, basically the preference of causes with conflicts aligns well with the backjumping assumption that you can backjump to a state where the causes are no longer conflicting, as if you don't have that preference than there's no guarantee on the order in which conflicting causes were added. But it is hardly a formal expanation. I think this continues to make the case for removing or adding a fallback (#144) to backjumping, as the optimization demonstrably makes an implicit assumption about how the provider prefers causes. |
I am unhappy with this solution, I am working further on understanding the root cause of the problem: #134 (comment) I may recreate this PR once I have a working test. |
FYI, I beleive I found a proper logical fix to backjumping: #155 |
Fixes #134
I would prefer if backjumping were fixed, but correctness is more important than performance.