-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the full diff by having more consistent indentation in the PrettyPrinter #11571
Improve the full diff by having more consistent indentation in the PrettyPrinter #11571
Conversation
Thanks @BenjaminSchubert, sorry for the delay in reviewing. First, I think we're in agreement that the new formatting is nicer then the existing one, so we can consider this aspect as accepted. Procedurally I'd prefer it if the vendoring change (your first commit) is done in its own PR which we can merge first, then the formatting change in a separate commit. It'd be easier to handle this way. Regarding the first commit message
I think we should replace all usages of pprint with our vendored copy from the start, for these reasons:
Regarding the technical vendoring aspect, I wonder if we should turn off black and linting for this file so as to make future syncs from upstream easier, or if we should just leave upstream behind and not look back... I'm not sure myself - WDYT? |
@bluetech thanks a lot for the update, no worries at all :)
Sure, I'll do this as soon as we clarified the other points :)
I will add a warning here that on my side I very rarely use the
Agreed that consistency is important. Technically, the
I do agree it gives much much better diffs (I would not be proposing this change otherwise :P). However, the outputs are also longer, which reduces the amount of information that fits on one screen, so in cases it doesn't need to be compared, it might become less readable As such, for the To showcase my concerns, here's a (probably rare, exaggerated failure): Keeping the old pprint for the `-v` mode
Using the new pprint for the `-v` mode
Ultimately, I am happy to go either way if you think using it for everything is nicer, I just wanted to show an example of the change before doing it. Let me know which direction you prefer.
I never know what's best. Advantages of linting as the rest is that it feels part of the codebase and is easier for contributors, but syncing is harder. Ultimately, I don't think the pprint codebase on upstream python will change often, but we don't really know. I tend to slightly prefer linting/formatting the same. Ultimately, unless there are big refactors upstream, it should still not be too hard to sync. |
OK, let's keep using the stdlib pprint for other stuff, and we can maybe migrate them later but as a separate concern.
I agree. Let's assimilate fully with pytest, and not worry about upstream syncs too much. This will allow us to freely improve and adjust the code to pytest needs, and have proper linting and formatting. I'd also like to type annotate the code later (unless you feel like doing it already ). So here's how I think it would be best to do the vendoring part:
That should bring us to the current state in main, but with a cleaned up vendored pprint and clear provenance. After that you can do the new improvements. BTW I know you just wanted to make some improvements and I side-tracked you with this vendoring stuff, I hope you're not too annoyed with that :) |
2ac6392
to
a6b35d8
Compare
Ok, #11626 contains the vendoring, and this one has now been rebased on top of it. Once the other is in, I'll rebase again. In the meantime I'll put it to draft.
No worries at all, those are all reasonable asks, I am glad you are open to get so much new code in to rewrite those diffs :D |
a6b35d8
to
86545ed
Compare
This is now ready for review :) |
The normal default pretty printer is not great when objects are nested and it can get hard to read the diff. Instead, provide a pretty printer that behaves more like when json get indented, which allows for smaller, more meaningful differences, at the expense of a slightly longer diff. This does not touch the other places where the pretty printer is used, and only updated the full diff one.
86545ed
to
445687c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I did not review the code itself, only the changelog and the test outcomes. 👍
@@ -677,8 +695,13 @@ def test_dict_different_items(self) -> None: | |||
"Right contains 2 more items:", | |||
"{'b': 1, 'c': 2}", | |||
"Full diff:", | |||
"- {'b': 1, 'c': 2}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed for very small diffs, the change makes it harder to read, but those are not problematic anyways, the problematic diffs are the long ones, which are greatly improved, so I think the trade-off is valid. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also first do a diff with the normal pprint, and if both entries are single line, use it, otherwise uses the long one if that's preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that complicate the code quite a bit? If not I would go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would have in mind is something like 0850f34
I haven't pushed it yet to this branch as I am not convinced about it for the following reasons:
- We lose the ability of having a stable diff across python versions (since the pprint code in the standard library changes)
- Older versions of python thus don't get some of the improvements (e.g. python3.8 and dataclasses)
- The diff looks nicer for simple cases (a same length value changed), but looks worse for cases where something is missing, see here
As such I think I personally lean towards keeping the diff consistent even if it gets over multiple lines all the time. I'm happy to push this commit here if you prefer. In the end, the really hard to read diffs are the longer ones :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I'm a fan of using the multi-line format even for short cases. This is what I do in my own code for git diffs as well. IMO, let's keep it, and if it ends up being weird or people complain, we can think of using a more compat format when it would fit in a single line.
Co-authored-by: Bruno Oliveira <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, please take a look.
Some things for possible follow up:
-
There are some small mistakes in the initial typing I didn't notice before, but can be fixed separately.
-
I wonder if
sort_dicts=True
is still the right choice these days, when dicts are ordered. This is also a separate discussion. -
The
context
can be simplified to aset
if I'm not mistaken, seems like it's currently aint -> 1
dict for legacy reasons (probably from beforeset
existed...). -
The
readable
stuff is now unused, we can perhaps drop it to simplify the code. -
Since
compact
is now ignored, I think we ought to drop this parameter.
Some ruminations on pprint after reading its code:
-
The pprint code has each
format
function care about the global indentation level. Intuitively if I were to design a pretty-printer I would have each formatter not care about the existing indentation level, i.e. format as if it's the top-level and only care about max width, and the machinery will insert the nesting indentation itself.I wonder if there's a reason why it wasn't done, maybe performance?
-
I wonder why Python went with the
_dispatch
dict instead of an extensible__pprint__
protocol, which would allow each type to handle its own pretty-printing instead of having it all in the pprint module. Maybe the protocol would be too complex.
context, | ||
level, | ||
) | ||
self._pprint_dict(object, stream, indent, allowance, context, level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not change the traditional OrderedDict
formatting, it's what people know/expect, I think.
With pprint_dict
it's:
OrderedDict({
'hello': 100,
})
but should be
OrderedDict([
('hello', 100),
])
src/_pytest/_io/pprint.py
Outdated
last = False | ||
while not last: | ||
ent = next_ent | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a for
loop now
testing/io/test_pprint.py
Outdated
""", | ||
id="defaultdict-two-items", | ||
), | ||
pytest.param( | ||
Counter(), | ||
"Counter()", | ||
"Counter({})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add a special case to keep the previous formatting here.
@@ -677,8 +695,13 @@ def test_dict_different_items(self) -> None: | |||
"Right contains 2 more items:", | |||
"{'b': 1, 'c': 2}", | |||
"Full diff:", | |||
"- {'b': 1, 'c': 2}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I'm a fan of using the multi-line format even for short cases. This is what I do in my own code for git diffs as well. IMO, let's keep it, and if it ends up being weird or people complain, we can think of using a more compat format when it would fit in a single line.
"? ^", | ||
"+ 2,", | ||
"+ 3,", | ||
" )", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to see another test here where the lists have some commonality, e.g. [1, 2, 3]
vs. [1, 20, 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluetech thanks for the review! I addressed the comments with the last commit.
Some things for possible follow up:
You read my mind. I've got https://github.com/BenjaminSchubert/pytest/tree/bschubert/pprint-cleanup lined up with a lot of the requested clean ups already :)
The pprint code has each format function care about the global indentation level. Intuitively if I were to design a pretty-printer I would have each formatter not care about the existing indentation level, i.e. format as if it's the top-level and only care about max width, and the machinery will insert the nesting indentation itself.
I don't think that's easy to do, as the max width
depends on the indentation (the further you go, the least space you have).
However, I think we can simplify this as we don't need to have both levels and indentations anymore, with consistent indentation per level. This is a follow up I intend to clean up
"? ^", | ||
"+ 2,", | ||
"+ 3,", | ||
" )", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added.
Regarding the The new pformat wasn't looking good to me, each key-value being spread over 4 lines instead of 1. So I was going to suggest adding complexity to fix it, but then I thought, since dicts are now ordered, the I think it would also be nice to update upstream pprint to use the new BTW, another interesting issue I found: python/cpython#51683
I was thinking that instead of increasing the indentation, you would decrease the max size. |
Oh fun, this means they updated the dict but not the pprint module. Change undone
Interesting idea, I'll see if I can find a nice way of doing this I also missed one part previously:
I think it's still better to sort them per key, as in most cases, I would expect the content to be what's important not the order (in which case OrderedDict could be used) |
Thanks @BenjaminSchubert! |
Overview
Note: This is an alternative implementation to #11537, and vendors pprint in
The normal default pretty printer is not great when objects are nested it can get hard to read the diffs produced.
Instead, provide a pretty printer that behaves more like when json get which allows for smaller, more meaningful differences, at the expense of a slightly longer diff.
This also has the nice side effect of making diffs stable across python versions, which it was not previously, as, for example, dataclass support was added in python3.9
Fix #1531
Alternatives/Potential improvements
This has the disadvantage that diffs are now longer, even for small changes (like
[1, 2] == [1, 3]
). We could potentially still keep the previous implementation for the case where the diff is the same length AND has a single line. This would take care of trivial cases. It would however make some diffs harder to read again, like[1, 2, 3] == [2, 3, 4]
, which would now show 3 differences.This is however not generalisable to deeply nested payloads.
Notes for maintainers
This is the requested alternative to #11537, which vendors the pprint module in, and then modifies the class in place.
The first commit vendors the module in, and makes it pass linting. Note that only the required part of the module are imported.
The second commit makes the modification and adds the same tests as #11537.
It is possible that we could still simplify the logic (e.g. always computing the indentation based on the level, instead of passing both). I believe this might be easier as a subsequent PR if requested, but happy to try and simplify this one if wanted.
Examples
Basic
Generated using the following script
We get the following differences on small entries:
Full example
Taking the example from https://github.com/lukaszb/pytest-dictsdiff, as in the issue:
Previously:
Now: