Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

BenjaminSchubert · 2023-10-30T22:51:04Z

Overview

Note: This is an alternative implementation to #11537, and vendors pprint in

The normal default pretty printer is not great when objects are nested it can get hard to read the diffs produced.

Instead, provide a pretty printer that behaves more like when json get which allows for smaller, more meaningful differences, at the expense of a slightly longer diff.

This also has the nice side effect of making diffs stable across python versions, which it was not previously, as, for example, dataclass support was added in python3.9

Fix #1531

Alternatives/Potential improvements

This has the disadvantage that diffs are now longer, even for small changes (like [1, 2] == [1, 3]). We could potentially still keep the previous implementation for the case where the diff is the same length AND has a single line. This would take care of trivial cases. It would however make some diffs harder to read again, like [1, 2, 3] == [2, 3, 4], which would now show 3 differences.

This is however not generalisable to deeply nested payloads.

Notes for maintainers

This is the requested alternative to #11537, which vendors the pprint module in, and then modifies the class in place.

The first commit vendors the module in, and makes it pass linting. Note that only the required part of the module are imported.

The second commit makes the modification and adds the same tests as #11537.

It is possible that we could still simplify the logic (e.g. always computing the indentation based on the level, instead of passing both). I believe this might be easier as a subsequent PR if requested, but happy to try and simplify this one if wanted.

Examples

Basic

Generated using the following script

from collections import Counter, defaultdict, deque
from dataclasses import dataclass
from functools import partial
import pprint
from types import SimpleNamespace
from typing import Optional, Dict, Any, IO, List
import difflib

from _pytest._io.pprint import PrettyPrinter


###
# Original pytest diff, copied in
###
class AlwaysDispatchingPrettyPrinter(pprint.PrettyPrinter):
  """PrettyPrinter that always dispatches (regardless of width)."""

  def _format(
      self,
      object: object,
      stream: IO[str],
      indent: int,
      allowance: int,
      context: Dict[int, Any],
      level: int,
  ) -> None:
      # Type ignored because _dispatch is private.
      p = self._dispatch.get(type(object).__repr__, None)  # type: ignore[attr-defined]

      objid = id(object)
      if objid in context or p is None:
          # Type ignored because _format is private.
          super()._format(  # type: ignore[misc]
              object,
              stream,
              indent,
              allowance,
              context,
              level,
          )
          return

      context[objid] = 1
      p(self, object, stream, indent, allowance, context, level + 1)
      del context[objid]


def _pformat_dispatch_original(
  object: object,
  indent: int = 1,
  width: int = 80,
  depth: Optional[int] = None,
  *,
  compact: bool = False,
) -> str:
  return AlwaysDispatchingPrettyPrinter(
      indent=indent, width=width, depth=depth, compact=compact
  ).pformat(object)


def _surrounding_parens_on_own_lines(lines: List[str]) -> None:
  """Move opening/closing parenthesis/bracket to own lines."""
  opening = lines[0][:1]
  if opening in ["(", "[", "{"]:
      lines[0] = " " + lines[0][1:]
      lines[:] = [opening] + lines
  closing = lines[-1][-1:]
  if closing in [")", "]", "}"]:
      lines[-1] = lines[-1][:-1] + ","
      lines[:] = lines + [closing]


def original_diff(left, right):
  left_formatting = pprint.pformat(left).splitlines()
  right_formatting = pprint.pformat(right).splitlines()

  # Re-format for different output lengths.
  lines_left = len(left_formatting)
  lines_right = len(right_formatting)
  if lines_left != lines_right:
      left_formatting = _pformat_dispatch_original(left).splitlines()
      right_formatting = _pformat_dispatch_original(right).splitlines()

  if lines_left > 1 or lines_right > 1:
      _surrounding_parens_on_own_lines(left_formatting)
      _surrounding_parens_on_own_lines(right_formatting)

  return left_formatting, right_formatting


###
# Script to generate the diffs
###

TABLE = """
<table>
<tr>
<th>Test</th>
<th>Main</th>
<th>Proposal</th>
</tr>
{rows}
</table>
"""

ROW = """
<tr>
<td colspan=2>

\`\`\`python
{python}
\`\`\`
</td>
</tr>
<tr>
<td>

\`\`\`diff
{diff_original}
\`\`\`
</td>

<td>

\`\`\`diff
{diff_new}
\`\`\`
</td>
</tr>
"""


def get_row(left, right):
  original = "\n".join(
      line.rstrip() for line in difflib.ndiff(*original_diff(left, right))
  )
  new = "\n".join(
      line.rstrip()
      for line in difflib.ndiff(
          PrettyPrinter().pformat(left).splitlines(), PrettyPrinter().pformat(right).splitlines()
      )
  )

  fmt = partial(pprint.pformat, indent=2, width=60)
  return f"{fmt(left)} \\ \n == {fmt(right)}", original, new


@dataclass
class DataclassWithTwoItems:
  foo: str
  bar: str


rows = [
  get_row(left, right)
  for left, right in [
      [{"one": 1, "two": 2}, {"three": 1, "two": 3}],
      [[1, 2], [1, 3]],
      [(1,), (2,)],
      [(1, 2), (1, 3)],
      [{1, 2}, {1, 3}],
      [SimpleNamespace(one=1, two=2), SimpleNamespace(one=2, three=2)],
      [
          defaultdict(str, {"one": "1", "two": "2"}),
          defaultdict(str, {"one": "1", "two": "3"}),
      ],
      [Counter("121"), Counter("122")],
      [deque([1, 2]), deque([1, 3])],
      [deque([1, 2], maxlen=3), deque([1, 3], maxlen=4)],
      [
          {
              "counter": Counter("122"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="bar"),
              "defaultdict": defaultdict(str, {"one": "1", "two": "2"}),
              "deque": deque([1, 2], maxlen=3),
              "dict": {"one": 1, "two": 2},
              "list": [1, 2],
              "set": {1, 2},
              "simplenamespace": SimpleNamespace(one=1, two=2),
              "tuple": (1, 2),
          },
          {
              "counter": Counter("121"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="baz"),
              "defaultdict": defaultdict(str, {"three": "1", "two": "3"}),
              "deque": deque([1, 3], maxlen=3),
              "dict": {"one": 1, "two": 3},
              "list": [1, 2, 3],
              "set": {1, 3},
              "simplenamespace": SimpleNamespace(one=1, two=2, three=3),
              "tuple": (1,),
          },
      ],
  ]
]

print(
  TABLE.format(
      rows="\n".join(
          [
              ROW.format(python=row[0], diff_original=row[1], diff_new=row[2])
              for row in rows
          ]
      )
  )
)

We get the following differences on small entries:

Test	Main	Proposal
{'one': 1, 'two': 2} \ == {'three': 1, 'two': 3}
- {'one': 1, 'two': 2} ? ^^ ^ + {'three': 1, 'two': 3} ? ^^^^ ^	{ - 'one': 1, ? ^^ + 'three': 1, ? ^^^^ - 'two': 2, ? ^ + 'two': 3, ? ^ }
[1, 2] \ == [1, 3]
- [1, 2] ? ^ + [1, 3] ? ^	[ 1, - 2, ? ^ + 3, ? ^ ]
(1,) \ == (2,)
- (1,) ? ^ + (2,) ? ^	( - 1, ? ^ + 2, ? ^ )
(1, 2) \ == (1, 3)
- (1, 2) ? ^ + (1, 3) ? ^	( 1, - 2, ? ^ + 3, ? ^ )
{1, 2} \ == {1, 3}
- {1, 2} ? ^ + {1, 3} ? ^	{ 1, - 2, ? ^ + 3, ? ^ }
namespace(one=1, two=2) \ == namespace(one=2, three=2)
- namespace(one=1, two=2) ? ^ ^^ + namespace(one=2, three=2) ? ^ ^^^^	namespace( - one=1, ? ^ + one=2, ? ^ - two=2, + three=2, )
defaultdict(<class 'str'>, {'one': '1', 'two': '2'}) \ == defaultdict(<class 'str'>, {'one': '1', 'two': '3'})
- defaultdict(<class 'str'>, {'one': '1', 'two': '2'}) ? ^ + defaultdict(<class 'str'>, {'one': '1', 'two': '3'}) ? ^	defaultdict(<class 'str'>, { 'one': '1', - 'two': '2', ? ^ + 'two': '3', ? ^ })
Counter({'1': 2, '2': 1}) \ == Counter({'2': 2, '1': 1})
- Counter({'1': 2, '2': 1}) + Counter({'2': 2, '1': 1})	Counter({ - '1': 2, ? ^ + '2': 2, ? ^ - '2': 1, ? ^ + '1': 1, ? ^ })
deque([1, 2]) \ == deque([1, 3])
- deque([1, 2]) ? ^ + deque([1, 3]) ? ^	deque([ 1, - 2, ? ^ + 3, ? ^ ])
deque([1, 2], maxlen=3) \ == deque([1, 3], maxlen=4)
- deque([1, 2], maxlen=3) ? ^ ^ + deque([1, 3], maxlen=4) ? ^ ^	- deque(maxlen=3, [ ? ^ + deque(maxlen=4, [ ? ^ 1, - 2, ? ^ + 3, ? ^ ])
{ 'counter': Counter({'2': 2, '1': 1}), 'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'), 'defaultdict': defaultdict(<class 'str'>, { 'one': '1', 'two': '2'}), 'deque': deque([1, 2], maxlen=3), 'dict': {'one': 1, 'two': 2}, 'list': [1, 2], 'set': {1, 2}, 'simplenamespace': namespace(one=1, two=2), 'tuple': (1, 2)} \ == { 'counter': Counter({'1': 2, '2': 1}), 'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'), 'defaultdict': defaultdict(<class 'str'>, { 'three': '1', 'two': '3'}), 'deque': deque([1, 3], maxlen=3), 'dict': {'one': 1, 'two': 3}, 'list': [1, 2, 3], 'set': {1, 3}, 'simplenamespace': namespace(one=1, two=2, three=3), 'tuple': (1,)}
{ - 'counter': Counter({'2': 2, '1': 1}), ? -------- + 'counter': Counter({'1': 2, '2': 1}), ? ++++++++ - 'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'), ? ^ + 'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'), ? ^ - 'defaultdict': defaultdict(<class 'str'>, {'one': '1', 'two': '2'}), ? ^^ ^ + 'defaultdict': defaultdict(<class 'str'>, {'three': '1', 'two': '3'}), ? ^^^^ ^ - 'deque': deque([1, 2], maxlen=3), ? ^ + 'deque': deque([1, 3], maxlen=3), ? ^ - 'dict': {'one': 1, 'two': 2}, ? ^ + 'dict': {'one': 1, 'two': 3}, ? ^ - 'list': [1, 2], + 'list': [1, 2, 3], ? +++ - 'set': {1, 2}, ? ^ + 'set': {1, 3}, ? ^ - 'simplenamespace': namespace(one=1, two=2), + 'simplenamespace': namespace(one=1, two=2, three=3), ? +++++++++ - 'tuple': (1, 2), ? -- + 'tuple': (1,), }	{ 'counter': Counter({ - '2': 2, ? ^ + '1': 2, ? ^ - '1': 1, ? ^ + '2': 1, ? ^ }), 'dataclass': DataclassWithTwoItems( foo='foo', - bar='bar', ? ^ + bar='baz', ? ^ ), 'defaultdict': defaultdict(<class 'str'>, { - 'one': '1', ? ^^ + 'three': '1', ? ^^^^ - 'two': '2', ? ^ + 'two': '3', ? ^ }), 'deque': deque(maxlen=3, [ 1, - 2, ? ^ + 3, ? ^ ]), 'dict': { 'one': 1, - 'two': 2, ? ^ + 'two': 3, ? ^ }, 'list': [ 1, 2, + 3, ], 'set': { 1, - 2, ? ^ + 3, ? ^ }, 'simplenamespace': namespace( one=1, two=2, + three=3, ), 'tuple': ( 1, - 2, ), }

Full example

Taking the example from https://github.com/lukaszb/pytest-dictsdiff, as in the issue:

Previously:

- {
-  'cell': '(056)-022-8631',
-  'dob': {'age': 34,
+ OrderedDict([('cell',
+               '(056)-022-8631'),
+              ('dob',
+               {'age': 44,
-          'date': '1953-11-04T01:21:04Z'},
?                     ^              ^
+                'date': '1983-11-04T01:21:14Z'}),
? ++++++                    ^              ^    +
-  'email': '[email protected]',
-  'gender': 'female',
-  'id': {'name': 'BSN',
+              ('email',
+               '[email protected]'),
+              ('gender',
+               'female'),
+              ('id',
+               {'name': 'BSN',
-         'value': '36180866'},
+                'value': '36180866'}),
? +++++++                            +
-  'location': {'city': 'Tholen',
+              ('location',
+               {'city': 'tholen',
-               'coordinates': {'latitude': '46.8823',
+                'coordinates': {'latitude': '46.8823',
? +
-                               'longitude': '175.8856'},
+                                'longitude': '175.8856'},
? +
-               'postcode': 64509,
?                               ^
+                'postcode': 64504,
? +                              ^
-               'state': 'groningen',
+                'state': 'groningen',
? +
-               'street': '2074 adriaen van ostadelaan',
+                'street': '2074 adriaen van ostadelaan',
? +
-               'timezone': {'description': 'Adelaide, Darwin',
+                'timezone': {'description': 'Adelaide, Darwin',
? +
-                            'offset': '+9:30'}},
+                             'offset': '+9:30'}}),
? +                                              +
+              ('login',
-  'login': {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^
+               {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^^^^^
-            'password': 'ontario',
+                'password': 'ontario',
? ++++
-            'salt': 'QVBKgEjy',
+                'salt': 'QVBKgEjy',
? ++++
-            'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
+                'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
? ++++
-            'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
+                'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
? ++++
-            'username': 'smallgorilla897',
+                'username': 'smallgorilla897',
? ++++
-            'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'},
+                'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'}),
? ++++                                                          +
-  'name': {'first': 'Zeyneb',
+              ('name',
+               {'first': 'zeyneb',
-           'last': 'Elfring',
?                    ^
+                'last': 'elfring',
? +++++                   ^
-           'title': 'mrs'},
+                'title': 'mrs'}),
? +++++                         +
-  'nat': 'NL',
-  'phone': '(209)-143-9697',
+              ('nat',
+               'NL'),
+              ('phone',
+               '(209)-143-9697'),
+              ('picture',
-  'picture': {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^
+               {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^^^
-              'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
+                'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
? ++
-              'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'},
+                'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'}),
? ++                                                                                   +
-  'registered': {'age': 3,
+              ('registered',
+               {'age': 3,
-                 'date': '2014-12-07T06:54:14Z'},
? -
+                'date': '2014-12-07T06:54:14Z'})],
?                                               ++
- }

Now:

- {
+ OrderedDict({
      'cell': '(056)-022-8631',
      'dob': {
-         'age': 34,
?                ^
+         'age': 44,
?                ^
-         'date': '1953-11-04T01:21:04Z',
?                    ^              ^
+         'date': '1983-11-04T01:21:14Z',
?                    ^              ^
      },
      'email': '[email protected]',
      'gender': 'female',
      'id': {
          'name': 'BSN',
          'value': '36180866',
      },
      'location': {
-         'city': 'Tholen',
?                  ^
+         'city': 'tholen',
?                  ^
          'coordinates': {
              'latitude': '46.8823',
              'longitude': '175.8856',
          },
-         'postcode': 64509,
?                         ^
+         'postcode': 64504,
?                         ^
          'state': 'groningen',
          'street': '2074 adriaen van ostadelaan',
          'timezone': {
              'description': 'Adelaide, Darwin',
              'offset': '+9:30',
          },
      },
      'login': {
          'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
          'password': 'ontario',
          'salt': 'QVBKgEjy',
          'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
          'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
          'username': 'smallgorilla897',
          'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
      },
      'name': {
-         'first': 'Zeyneb',
?                   ^
+         'first': 'zeyneb',
?                   ^
-         'last': 'Elfring',
?                  ^
+         'last': 'elfring',
?                  ^
          'title': 'mrs',
      },
      'nat': 'NL',
      'phone': '(209)-143-9697',
      'picture': {
          'large': 'https://randomuser.me/api/portraits/women/37.jpg',
          'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
          'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
      },
      'registered': {
          'age': 3,
          'date': '2014-12-07T06:54:14Z',
      },
- }
+ })

bluetech · 2023-11-12T22:08:38Z

Thanks @BenjaminSchubert, sorry for the delay in reviewing.

First, I think we're in agreement that the new formatting is nicer then the existing one, so we can consider this aspect as accepted.

Procedurally I'd prefer it if the vendoring change (your first commit) is done in its own PR which we can merge first, then the formatting change in a separate commit. It'd be easier to handle this way.

Regarding the first commit message

This takes in the version of the pprint moduel that is used from python3.12, essentially backporting it for python3.8 and 3.9.

This will allow us to make changes in how it represents objects without relying on private methods.

Only the required API surface was copied, the rest is left out.

Note that we still continue to use the upstream version for every other part of the system. It might be worth using it everywhere, but is probably too many changes at once.

I think we should replace all usages of pprint with our vendored copy from the start, for these reasons:

Consistency
Presumably ours is better :)
If we're going the cut the parts we don't use, let's make sure the other pprint-using code doesn't need them either
I think it should just be a manner of replacing import pprint with our import? Probably some tests will fail due to the different formatting, but hopefully not too much work to fix either, and will provide us with more evaluation data.

Regarding the technical vendoring aspect, I wonder if we should turn off black and linting for this file so as to make future syncs from upstream easier, or if we should just leave upstream behind and not look back... I'm not sure myself - WDYT?

BenjaminSchubert · 2023-11-13T22:01:50Z

@bluetech thanks a lot for the update, no worries at all :)

Procedurally I'd prefer it if the vendoring change (your first commit) is done in its own PR which we can merge first, then the formatting change in a separate commit. It'd be easier to handle this way.

Sure, I'll do this as soon as we clarified the other points :)

I think we should replace all usages of pprint with our vendored copy from the start, for these reasons:

I will add a warning here that on my side I very rarely use the -v mode, most of my pytest invocations are setup to have -vv at all times, which is the mode I personally prefer, so I might be biased :)

Consistency

Agreed that consistency is important. Technically, the verbose (so differing items, common items, etc) is already inconsistent with the differing items using SafeRepr, whereas the others use the pprint.pformat, I do think optimizing for the context does potentially make sense, as I understand was the case here before?

Presumably ours is better :)

I do agree it gives much much better diffs (I would not be proposing this change otherwise :P). However, the outputs are also longer, which reduces the amount of information that fits on one screen, so in cases it doesn't need to be compared, it might become less readable

As such, for the verbose mode I am not 100% sure this is the right call.

To showcase my concerns, here's a (probably rare, exaggerated failure):

Keeping the old pprint for the `-v` mode

E         Common items:
E         {'email': '[email protected]',
E          'gender': 'female',
E          'id': {'name': 'BSN', 'value': '36180866'},
E          'login': {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
E                    'password': 'ontario',
E                    'salt': 'QVBKgEjy',
E                    'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
E                    'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
E                    'username': 'smallgorilla897',
E                    'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'},
E          'nat': 'NL',
E          'phone': '(209)-143-9697',
E          'picture': {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
E                      'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
E                      'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'},
E          'registered': {'age': 3, 'date': '2014-12-07T06:54:14Z'}}
E         Differing items:
E         {'location': {'city': 'tholen', 'coordinates': {'latitude': '46.8823', 'longitude': '175.8856'}, 'postcode': 64504, 'state': 'groningen', ...}} != {'location': {'city': 'Tholen', 'coordinates': {'latitude': '46.8823', 'longitude': '175.8856'}, 'postcode': 64509, 'state': 'groningen', ...}}
E         {'name': {'first': 'zeyneb', 'last': 'elfring', 'title': 'mrs'}} != {'name': {'first': 'Zeyneb', 'last': 'Elfring', 'title': 'mrs'}}
E         {'dob': {'age': 44, 'date': '1983-11-04T01:21:14Z'}} != {'dob': {'age': 34, 'date': '1953-11-04T01:21:04Z'}}
E         Left contains 1 more item:
E         {'cell': '(056)-022-8631'}
E         Right contains 1 more item:
E         {'cellphone': '(056)-022-8631'}
E         Full diff:
E         - {
E         + OrderedDict({
E         -     'cellphone': '(056)-022-8631',
E         ?          -----
E         +     'cell': '(056)-022-8631',
E               'dob': {
E         -         'age': 34,
E         ?                ^
E         +         'age': 44,
E         ?                ^
E         -         'date': '1953-11-04T01:21:04Z',
E         ?                    ^              ^
E         +         'date': '1983-11-04T01:21:14Z',
E         ?                    ^              ^
E               },
E               'email': '[email protected]',
E               'gender': 'female',
E               'id': {
E                   'name': 'BSN',
E                   'value': '36180866',
E               },
E               'location': {
E         -         'city': 'Tholen',
E         ?                  ^
E         +         'city': 'tholen',
E         ?                  ^
E                   'coordinates': {
E                       'latitude': '46.8823',
E                       'longitude': '175.8856',
E                   },
E         -         'postcode': 64509,
E         ?                         ^
E         +         'postcode': 64504,
E         ?                         ^
E                   'state': 'groningen',
E                   'street': '2074 adriaen van ostadelaan',
E                   'timezone': {
E                       'description': 'Adelaide, Darwin',
E                       'offset': '+9:30',
E                   },
E               },
E               'login': {
E                   'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
E                   'password': 'ontario',
E                   'salt': 'QVBKgEjy',
E                   'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
E                   'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
E                   'username': 'smallgorilla897',
E                   'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
E               },
E               'name': {
E         -         'first': 'Zeyneb',
E         ?                   ^
E         +         'first': 'zeyneb',
E         ?                   ^
E         -         'last': 'Elfring',
E         ?                  ^
E         +         'last': 'elfring',
E         ?                  ^
E                   'title': 'mrs',
E               },
E               'nat': 'NL',
E               'phone': '(209)-143-9697',
E               'picture': {
E                   'large': 'https://randomuser.me/api/portraits/women/37.jpg',
E                   'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
E                   'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
E               },
E               'registered': {
E                   'age': 3,
E                   'date': '2014-12-07T06:54:14Z',
E               },
E         - }
E         + })

Using the new pprint for the `-v` mode

E         Common items:
E         {
E             'email': '[email protected]',
E             'gender': 'female',
E             'id': {
E                 'name': 'BSN',
E                 'value': '36180866',
E             },
E             'login': {
E                 'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
E                 'password': 'ontario',
E                 'salt': 'QVBKgEjy',
E                 'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
E                 'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
E                 'username': 'smallgorilla897',
E                 'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
E             },
E             'nat': 'NL',
E             'phone': '(209)-143-9697',
E             'picture': {
E                 'large': 'https://randomuser.me/api/portraits/women/37.jpg',
E                 'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
E                 'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
E             },
E             'registered': {
E                 'age': 3,
E                 'date': '2014-12-07T06:54:14Z',
E             },
E         }
E         Differing items:
E         {'name': {'first': 'zeyneb', 'last': 'elfring', 'title': 'mrs'}} != {'name': {'first': 'Zeyneb', 'last': 'Elfring', 'title': 'mrs'}}
E         {'location': {'city': 'tholen', 'coordinates': {'latitude': '46.8823', 'longitude': '175.8856'}, 'postcode': 64504, 'state': 'groningen', ...}} != {'location': {'city': 'Tholen', 'coordinates': {'latitude': '46.8823', 'longitude': '175.8856'}, 'postcode': 64509, 'state': 'groningen', ...}}
E         {'dob': {'age': 44, 'date': '1983-11-04T01:21:14Z'}} != {'dob': {'age': 34, 'date': '1953-11-04T01:21:04Z'}}
E         Left contains 1 more item:
E         {
E             'cell': '(056)-022-8631',
E         }
E         Right contains 1 more item:
E         {
E             'cellphone': '(056)-022-8631',
E         }
E         Full diff:
E         - {
E         + OrderedDict({
E         -     'cellphone': '(056)-022-8631',
E         ?          -----
E         +     'cell': '(056)-022-8631',
E               'dob': {
E         -         'age': 34,
E         ?                ^
E         +         'age': 44,
E         ?                ^
E         -         'date': '1953-11-04T01:21:04Z',
E         ?                    ^              ^
E         +         'date': '1983-11-04T01:21:14Z',
E         ?                    ^              ^
E               },
E               'email': '[email protected]',
E               'gender': 'female',
E               'id': {
E                   'name': 'BSN',
E                   'value': '36180866',
E               },
E               'location': {
E         -         'city': 'Tholen',
E         ?                  ^
E         +         'city': 'tholen',
E         ?                  ^
E                   'coordinates': {
E                       'latitude': '46.8823',
E                       'longitude': '175.8856',
E                   },
E         -         'postcode': 64509,
E         ?                         ^
E         +         'postcode': 64504,
E         ?                         ^
E                   'state': 'groningen',
E                   'street': '2074 adriaen van ostadelaan',
E                   'timezone': {
E                       'description': 'Adelaide, Darwin',
E                       'offset': '+9:30',
E                   },
E               },
E               'login': {
E                   'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
E                   'password': 'ontario',
E                   'salt': 'QVBKgEjy',
E                   'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
E                   'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
E                   'username': 'smallgorilla897',
E                   'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
E               },
E               'name': {
E         -         'first': 'Zeyneb',
E         ?                   ^
E         +         'first': 'zeyneb',
E         ?                   ^
E         -         'last': 'Elfring',
E         ?                  ^
E         +         'last': 'elfring',
E         ?                  ^
E                   'title': 'mrs',
E               },
E               'nat': 'NL',
E               'phone': '(209)-143-9697',
E               'picture': {
E                   'large': 'https://randomuser.me/api/portraits/women/37.jpg',
E                   'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
E                   'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
E               },
E               'registered': {
E                   'age': 3,
E                   'date': '2014-12-07T06:54:14Z',
E               },
E         - }
E         + })

Ultimately, I am happy to go either way if you think using it for everything is nicer, I just wanted to show an example of the change before doing it. Let me know which direction you prefer.

Regarding the technical vendoring aspect, I wonder if we should turn off black and linting for this file so as to make future syncs from upstream easier, or if we should just leave upstream behind and not look back... I'm not sure myself - WDYT?

I never know what's best. Advantages of linting as the rest is that it feels part of the codebase and is easier for contributors, but syncing is harder. Ultimately, I don't think the pprint codebase on upstream python will change often, but we don't really know. I tend to slightly prefer linting/formatting the same. Ultimately, unless there are big refactors upstream, it should still not be too hard to sync.

bluetech · 2023-11-17T09:22:35Z

Ultimately, I am happy to go either way if you think using it for everything is nicer, I just wanted to show an example of the change before doing it. Let me know which direction you prefer.

OK, let's keep using the stdlib pprint for other stuff, and we can maybe migrate them later but as a separate concern.

I never know what's best. Advantages of linting as the rest is that it feels part of the codebase and is easier for contributors, but syncing is harder. Ultimately, I don't think the pprint codebase on upstream python will change often, but we don't really know. I tend to slightly prefer linting/formatting the same. Ultimately, unless there are big refactors upstream, it should still not be too hard to sync.

I agree. Let's assimilate fully with pytest, and not worry about upstream syncs too much. This will allow us to freely improve and adjust the code to pytest needs, and have proper linting and formatting. I'd also like to type annotate the code later (unless you feel like doing it already ).

So here's how I think it would be best to do the vendoring part:

First commit - copy stdlib pprint.py verbatim, ignoring formatting and linting using # fmt: off and flake8: noqa etc. at the top of the file.
Second commit: Delete parts we don't need.
Third commit: Apply formatting and linting.
Fourth commit: Integrate AlwaysDispatchingPrettyPrinter into _pytest.io.PrettyPrinter and switch to it.

That should bring us to the current state in main, but with a cleaned up vendored pprint and clear provenance. After that you can do the new improvements.

BTW I know you just wanted to make some improvements and I side-tracked you with this vendoring stuff, I hope you're not too annoyed with that :)

BenjaminSchubert · 2023-11-18T10:00:04Z

Ok, #11626 contains the vendoring, and this one has now been rebased on top of it. Once the other is in, I'll rebase again. In the meantime I'll put it to draft.

BTW I know you just wanted to make some improvements and I side-tracked you with this vendoring stuff, I hope you're not too annoyed with that :)

No worries at all, those are all reasonable asks, I am glad you are open to get so much new code in to rewrite those diffs :D

BenjaminSchubert · 2023-11-20T16:44:33Z

This is now ready for review :)

The normal default pretty printer is not great when objects are nested and it can get hard to read the diff. Instead, provide a pretty printer that behaves more like when json get indented, which allows for smaller, more meaningful differences, at the expense of a slightly longer diff. This does not touch the other places where the pretty printer is used, and only updated the full diff one.

nicoddemus

Great work! I did not review the code itself, only the changelog and the test outcomes. 👍

changelog/1531.improvement.rst

nicoddemus · 2023-11-24T12:30:49Z

testing/test_assertion.py

@@ -677,8 +695,13 @@ def test_dict_different_items(self) -> None:
            "Right contains 2 more items:",
            "{'b': 1, 'c': 2}",
            "Full diff:",
-            "- {'b': 1, 'c': 2}",


Indeed for very small diffs, the change makes it harder to read, but those are not problematic anyways, the problematic diffs are the long ones, which are greatly improved, so I think the trade-off is valid. 👍

We could also first do a diff with the normal pprint, and if both entries are single line, use it, otherwise uses the long one if that's preferred

Wouldn't that complicate the code quite a bit? If not I would go for it.

What I would have in mind is something like 0850f34

I haven't pushed it yet to this branch as I am not convinced about it for the following reasons:

We lose the ability of having a stable diff across python versions (since the pprint code in the standard library changes)

Older versions of python thus don't get some of the improvements (e.g. python3.8 and dataclasses)

The diff looks nicer for simple cases (a same length value changed), but looks worse for cases where something is missing, see here

As such I think I personally lean towards keeping the diff consistent even if it gets over multiple lines all the time. I'm happy to push this commit here if you prefer. In the end, the really hard to read diffs are the longer ones :)

Generally I'm a fan of using the multi-line format even for short cases. This is what I do in my own code for git diffs as well. IMO, let's keep it, and if it ends up being weird or people complain, we can think of using a more compat format when it would fit in a single line.

Co-authored-by: Bruno Oliveira <[email protected]>

bluetech

I left a few comments, please take a look.

Some things for possible follow up:

There are some small mistakes in the initial typing I didn't notice before, but can be fixed separately.
I wonder if sort_dicts=True is still the right choice these days, when dicts are ordered. This is also a separate discussion.
The context can be simplified to a set if I'm not mistaken, seems like it's currently a int -> 1 dict for legacy reasons (probably from before set existed...).
The readable stuff is now unused, we can perhaps drop it to simplify the code.
Since compact is now ignored, I think we ought to drop this parameter.

Some ruminations on pprint after reading its code:

The pprint code has each format function care about the global indentation level. Intuitively if I were to design a pretty-printer I would have each formatter not care about the existing indentation level, i.e. format as if it's the top-level and only care about max width, and the machinery will insert the nesting indentation itself.

I wonder if there's a reason why it wasn't done, maybe performance?
I wonder why Python went with the _dispatch dict instead of an extensible __pprint__ protocol, which would allow each type to handle its own pretty-printing instead of having it all in the pprint module. Maybe the protocol would be too complex.

bluetech · 2023-11-27T09:39:58Z

src/_pytest/_io/pprint.py

-            context,
-            level,
-        )
+        self._pprint_dict(object, stream, indent, allowance, context, level)


We should not change the traditional OrderedDict formatting, it's what people know/expect, I think.

With pprint_dict it's:

OrderedDict({ 'hello': 100, })

but should be

OrderedDict([ ('hello', 100), ])

bluetech · 2023-11-27T09:46:45Z

src/_pytest/_io/pprint.py

-        last = False
-        while not last:
-            ent = next_ent
+        while True:


This can be a for loop now

bluetech · 2023-11-27T10:22:33Z

testing/io/test_pprint.py

            """,
            id="defaultdict-two-items",
        ),
        pytest.param(
            Counter(),
-            "Counter()",
+            "Counter({})",


Would add a special case to keep the previous formatting here.

bluetech · 2023-11-27T10:26:25Z

testing/test_assertion.py

@@ -677,8 +695,13 @@ def test_dict_different_items(self) -> None:
            "Right contains 2 more items:",
            "{'b': 1, 'c': 2}",
            "Full diff:",
-            "- {'b': 1, 'c': 2}",


Generally I'm a fan of using the multi-line format even for short cases. This is what I do in my own code for git diffs as well. IMO, let's keep it, and if it ends up being weird or people complain, we can think of using a more compat format when it would fit in a single line.

bluetech · 2023-11-27T10:29:21Z

testing/test_assertion.py

+            "?     ^",
+            "+     2,",
+            "+     3,",
+            "  )",
        ]


I think it would be nice to see another test here where the lists have some commonality, e.g. [1, 2, 3] vs. [1, 20, 3]

Good point, added.

BenjaminSchubert

@bluetech thanks for the review! I addressed the comments with the last commit.

Some things for possible follow up:

You read my mind. I've got https://github.com/BenjaminSchubert/pytest/tree/bschubert/pprint-cleanup lined up with a lot of the requested clean ups already :)

The pprint code has each format function care about the global indentation level. Intuitively if I were to design a pretty-printer I would have each formatter not care about the existing indentation level, i.e. format as if it's the top-level and only care about max width, and the machinery will insert the nesting indentation itself.

I don't think that's easy to do, as the max width depends on the indentation (the further you go, the least space you have).

However, I think we can simplify this as we don't need to have both levels and indentations anymore, with consistent indentation per level. This is a follow up I intend to clean up

BenjaminSchubert · 2023-11-27T12:52:58Z

testing/test_assertion.py

+            "?     ^",
+            "+     2,",
+            "+     3,",
+            "  )",
        ]


Good point, added.

bluetech · 2023-11-27T13:50:30Z

Regarding the OrderedDict...

The new pformat wasn't looking good to me, each key-value being spread over 4 lines instead of 1. So I was going to suggest adding complexity to fix it, but then I thought, since dicts are now ordered, the OrderedDict([('key', 'value')]) is really no better than the OrderedDict({"key": "value"}) format anymore, so maybe we can suggest Python to change it and then we don't have to complicate our code :) But it turns out, someone already did (python/cpython#101446) and and it was actually accepted and implemented in Python 3.12 🎉. So let's undo this last change :)

I think it would also be nice to update upstream pprint to use the new OrderedDict repr, in case you're also interested in contributing to cpython.

BTW, another interesting issue I found: python/cpython#51683

I don't think that's easy to do, as the max width depends on the indentation (the further you go, the least space you have).

I was thinking that instead of increasing the indentation, you would decrease the max size.

BenjaminSchubert · 2023-11-27T13:59:15Z

But it turns out, someone already did (python/cpython#101446) and and it was actually accepted and implemented in Python 3.12 🎉. So let's undo this last change :)

Oh fun, this means they updated the dict but not the pprint module. Change undone

I was thinking that instead of increasing the indentation, you would decrease the max size.

Interesting idea, I'll see if I can find a nice way of doing this

I also missed one part previously:

I wonder if sort_dicts=True is still the right choice these days, when dicts are ordered. This is also a separate discussion.

I think it's still better to sort them per key, as in most cases, I would expect the content to be what's important not the order (in which case OrderedDict could be used)

bluetech · 2023-11-27T14:48:45Z

Thanks @BenjaminSchubert!

BenjaminSchubert mentioned this pull request Oct 30, 2023

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

Closed

BenjaminSchubert mentioned this pull request Nov 18, 2023

Vendor in and absorb the pprint module from upstream #11626

Merged

BenjaminSchubert force-pushed the bschubert/nicer-comparisons-vendor branch from 2ac6392 to a6b35d8 Compare November 18, 2023 09:58

BenjaminSchubert force-pushed the bschubert/nicer-comparisons-vendor branch from a6b35d8 to 86545ed Compare November 20, 2023 16:43

BenjaminSchubert force-pushed the bschubert/nicer-comparisons-vendor branch from 86545ed to 445687c Compare November 20, 2023 19:03

nicoddemus approved these changes Nov 24, 2023

View reviewed changes

Update changelog/1531.improvement.rst

85807ba

Co-authored-by: Bruno Oliveira <[email protected]>

bluetech requested changes Nov 27, 2023

View reviewed changes

Address PR comments

55a78b6

BenjaminSchubert commented Nov 27, 2023

View reviewed changes

testing/test_assertion.py

"? ^",

"+ 2,",

"+ 3,",

" )",

]

Copy link

Contributor Author

BenjaminSchubert Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added.

Re-print the ordered dict as a normal dict

ccfb669

bluetech approved these changes Nov 27, 2023

View reviewed changes

bluetech merged commit 2d1710e into pytest-dev:main Nov 27, 2023
24 checks passed

BenjaminSchubert deleted the bschubert/nicer-comparisons-vendor branch November 27, 2023 18:31

stdedos mentioned this pull request May 15, 2024

Enhancement: Present AssertionError differently skyscreamer/JSONassert#183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

BenjaminSchubert commented Oct 30, 2023

bluetech commented Nov 12, 2023

BenjaminSchubert commented Nov 13, 2023 •

edited

Loading

bluetech commented Nov 17, 2023

BenjaminSchubert commented Nov 18, 2023

BenjaminSchubert commented Nov 20, 2023

nicoddemus left a comment

nicoddemus Nov 24, 2023

BenjaminSchubert Nov 25, 2023

nicoddemus Nov 25, 2023

BenjaminSchubert Nov 26, 2023

bluetech Nov 27, 2023

bluetech left a comment

bluetech Nov 27, 2023

bluetech Nov 27, 2023

bluetech Nov 27, 2023

bluetech Nov 27, 2023

bluetech Nov 27, 2023

BenjaminSchubert Nov 27, 2023

BenjaminSchubert left a comment

BenjaminSchubert Nov 27, 2023

bluetech commented Nov 27, 2023

BenjaminSchubert commented Nov 27, 2023

bluetech commented Nov 27, 2023

Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

Conversation

BenjaminSchubert commented Oct 30, 2023

Overview

Alternatives/Potential improvements

Notes for maintainers

Examples

Basic

Full example

Previously:

Now:

bluetech commented Nov 12, 2023

BenjaminSchubert commented Nov 13, 2023 • edited Loading

bluetech commented Nov 17, 2023

BenjaminSchubert commented Nov 18, 2023

BenjaminSchubert commented Nov 20, 2023

nicoddemus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminSchubert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech commented Nov 27, 2023

BenjaminSchubert commented Nov 27, 2023

bluetech commented Nov 27, 2023

BenjaminSchubert commented Nov 13, 2023 •

edited

Loading