Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

BenjaminSchubert · 2023-10-22T12:54:21Z

Overview

Note: An alternative implementation has been made at #11571, following feedback.

The normal default pretty printer is not great when objects are nested it can get hard to read the diffs produced.

Instead, provide a pretty printer that behaves more like when json get which allows for smaller, more meaningful differences, at the expense of a slightly longer diff.

Fix #1531

Alternatives/Potential improvements

This has the disadvantage that diffs are now longer, even for small changes (like [1, 2] == [1, 3]). We could potentially still keep the previous implementation for the case where the diff is the same length AND has a single line. This would take care of trivial cases. It would however make some diffs harder to read again, like [1, 2, 3] == [2, 3, 4], which would now show 3 differences.

This is however not generalisable to deeply nested payloads.

Notes for maintainers

This needs to work a lot with private details of the pretty printer. There was already some of that happening, but this makes it much more tied to it. It would technically be possible to reimplement most of it without touching private methods, but that would require a lot of copying. this module has been quite stable over time so it might not be worth it.

The first commit contains the initial 'crude' implementation, which mostly copy pastes and adapts the code from the standard librarie's prettyprinter. The second commit (the !fixup) cleans up the interface and removes duplication.

The support for dataclasses and simplenamespaces is degraded before python 3.10 and 3.9 respectively. I believe we could backport the implementation, but not sure how important that is.

Examples

Basic

Generated using the following script

from collections import Counter, defaultdict, deque
from dataclasses import dataclass
from functools import partial
import pprint
from types import SimpleNamespace
from typing import Optional, Dict, Any, IO, List
import difflib

from _pytest._io.saferepr import _pformat_dispatch

###
# Original pytest diff, copied in
###
class AlwaysDispatchingPrettyPrinter(pprint.PrettyPrinter):
  """PrettyPrinter that always dispatches (regardless of width)."""

  def _format(
      self,
      object: object,
      stream: IO[str],
      indent: int,
      allowance: int,
      context: Dict[int, Any],
      level: int,
  ) -> None:
      # Type ignored because _dispatch is private.
      p = self._dispatch.get(type(object).__repr__, None)  # type: ignore[attr-defined]

      objid = id(object)
      if objid in context or p is None:
          # Type ignored because _format is private.
          super()._format(  # type: ignore[misc]
              object,
              stream,
              indent,
              allowance,
              context,
              level,
          )
          return

      context[objid] = 1
      p(self, object, stream, indent, allowance, context, level + 1)
      del context[objid]


def _pformat_dispatch_original(
  object: object,
  indent: int = 1,
  width: int = 80,
  depth: Optional[int] = None,
  *,
  compact: bool = False,
) -> str:
  return AlwaysDispatchingPrettyPrinter(
      indent=indent, width=width, depth=depth, compact=compact
  ).pformat(object)

def _surrounding_parens_on_own_lines(lines: List[str]) -> None:
  """Move opening/closing parenthesis/bracket to own lines."""
  opening = lines[0][:1]
  if opening in ["(", "[", "{"]:
      lines[0] = " " + lines[0][1:]
      lines[:] = [opening] + lines
  closing = lines[-1][-1:]
  if closing in [")", "]", "}"]:
      lines[-1] = lines[-1][:-1] + ","
      lines[:] = lines + [closing]


def original_diff(left, right):
  left_formatting = pprint.pformat(left).splitlines()
  right_formatting = pprint.pformat(right).splitlines()

  # Re-format for different output lengths.
  lines_left = len(left_formatting)
  lines_right = len(right_formatting)
  if lines_left != lines_right:
      left_formatting = _pformat_dispatch_original(left).splitlines()
      right_formatting = _pformat_dispatch_original(right).splitlines()

  if lines_left > 1 or lines_right > 1:
      _surrounding_parens_on_own_lines(left_formatting)
      _surrounding_parens_on_own_lines(right_formatting)

  return left_formatting, right_formatting


###
# Script to generate the diffs
###

TABLE = """
<table>
<tr>
<th>Test</th>
<th>Main</th>
<th>Proposal</th>
</tr>
{rows}
</table>
"""

ROW = """
<tr>
<td colspan=2>

\`\`\`python
{python}
\`\`\`
</td>
</tr>
<tr>
<td>

\`\`\`diff
{diff_original}
\`\`\`
</td>

<td>

\`\`\`diff
{diff_new}
\`\`\`
</td>
</tr>
"""


def get_row(left, right):
  original = "\n".join(
      line.rstrip()
      for line in difflib.ndiff(*original_diff(left, right))
  )
  new = "\n".join(
      line.rstrip()
      for line in difflib.ndiff(_pformat_dispatch(left).splitlines(), _pformat_dispatch(right).splitlines())
  )

  fmt = partial(pprint.pformat, indent=2, width=60)
  return f"{fmt(left)} \\ \n == {fmt(right)}", original, new


@dataclass
class DataclassWithTwoItems:
  foo: str
  bar: str


rows = [
  get_row(left, right)
  for left, right in [
      [{"one": 1, "two": 2}, {"three": 1, "two": 3}],
      [[1, 2], [1, 3]],
      [(1,), (2,)],
      [(1, 2), (1, 3)],
      [{1, 2}, {1, 3}],
      [SimpleNamespace(one=1, two=2), SimpleNamespace(one=2, three=2)],
      [
          defaultdict(str, {"one": "1", "two": "2"}),
          defaultdict(str, {"one": "1", "two": "3"}),
      ],
      [Counter("121"), Counter("122")],
      [deque([1, 2]), deque([1, 3])],
      [deque([1, 2], maxlen=3), deque([1, 3], maxlen=4)],
      [
          {
              "counter": Counter("122"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="bar"),
              "defaultdict": defaultdict(str, {"one": "1", "two": "2"}),
              "deque": deque([1, 2], maxlen=3),
              "dict": {"one": 1, "two": 2},
              "list": [1, 2],
              "set": {1, 2},
              "simplenamespace": SimpleNamespace(one=1, two=2),
              "tuple": (1, 2),
          },
          {
              "counter": Counter("121"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="baz"),
              "defaultdict": defaultdict(str, {"three": "1", "two": "3"}),
              "deque": deque([1, 3], maxlen=3),
              "dict": {"one": 1, "two": 3},
              "list": [1, 2, 3],
              "set": {1, 3},
              "simplenamespace": SimpleNamespace(one=1, two=2, three=3),
              "tuple": (1,),
          },
      ],
  ]
]

print(
  TABLE.format(
      rows="\n".join(
          [
              ROW.format(python=row[0], diff_original=row[1], diff_new=row[2])
              for row in rows
          ]
      )
  )
)

We get the following differences on small entries:

Test	Main	Proposal
{'one': 1, 'two': 2} \ == {'three': 1, 'two': 3}
- {'one': 1, 'two': 2} ? ^^ ^ + {'three': 1, 'two': 3} ? ^^^^ ^	{ - 'one': 1, ? ^^ + 'three': 1, ? ^^^^ - 'two': 2, ? ^ + 'two': 3, ? ^ }
[1, 2] \ == [1, 3]
- [1, 2] ? ^ + [1, 3] ? ^	[ 1, - 2, ? ^ + 3, ? ^ ]
(1,) \ == (2,)
- (1,) ? ^ + (2,) ? ^	( - 1, ? ^ + 2, ? ^ )
(1, 2) \ == (1, 3)
- (1, 2) ? ^ + (1, 3) ? ^	( 1, - 2, ? ^ + 3, ? ^ )
{1, 2} \ == {1, 3}
- {1, 2} ? ^ + {1, 3} ? ^	{ 1, - 2, ? ^ + 3, ? ^ }
namespace(one=1, two=2) \ == namespace(one=2, three=2)
- namespace(one=1, two=2) ? ^ ^^ + namespace(one=2, three=2) ? ^ ^^^^	namespace( - one=1, ? ^ + one=2, ? ^ - two=2, + three=2, )
defaultdict(<class 'str'>, {'one': '1', 'two': '2'}) \ == defaultdict(<class 'str'>, {'one': '1', 'two': '3'})
- defaultdict(<class 'str'>, {'one': '1', 'two': '2'}) ? ^ + defaultdict(<class 'str'>, {'one': '1', 'two': '3'}) ? ^	defaultdict(<class 'str'>, { 'one': '1', - 'two': '2', ? ^ + 'two': '3', ? ^ })
Counter({'1': 2, '2': 1}) \ == Counter({'2': 2, '1': 1})
- Counter({'1': 2, '2': 1}) + Counter({'2': 2, '1': 1})	Counter({ - '1': 2, ? ^ + '2': 2, ? ^ - '2': 1, ? ^ + '1': 1, ? ^ })
deque([1, 2]) \ == deque([1, 3])
- deque([1, 2]) ? ^ + deque([1, 3]) ? ^	deque([ 1, - 2, ? ^ + 3, ? ^ ])
deque([1, 2], maxlen=3) \ == deque([1, 3], maxlen=4)
- deque([1, 2], maxlen=3) ? ^ ^ + deque([1, 3], maxlen=4) ? ^ ^	- deque(maxlen=3, [ ? ^ + deque(maxlen=4, [ ? ^ 1, - 2, ? ^ + 3, ? ^ ])
{ 'counter': Counter({'2': 2, '1': 1}), 'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'), 'defaultdict': defaultdict(<class 'str'>, { 'one': '1', 'two': '2'}), 'deque': deque([1, 2], maxlen=3), 'dict': {'one': 1, 'two': 2}, 'list': [1, 2], 'set': {1, 2}, 'simplenamespace': namespace(one=1, two=2), 'tuple': (1, 2)} \ == { 'counter': Counter({'1': 2, '2': 1}), 'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'), 'defaultdict': defaultdict(<class 'str'>, { 'three': '1', 'two': '3'}), 'deque': deque([1, 3], maxlen=3), 'dict': {'one': 1, 'two': 3}, 'list': [1, 2, 3], 'set': {1, 3}, 'simplenamespace': namespace(one=1, two=2, three=3), 'tuple': (1,)}
{ - 'counter': Counter({'2': 2, '1': 1}), ? -------- + 'counter': Counter({'1': 2, '2': 1}), ? ++++++++ - 'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'), ? ^ + 'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'), ? ^ - 'defaultdict': defaultdict(<class 'str'>, {'one': '1', 'two': '2'}), ? ^^ ^ + 'defaultdict': defaultdict(<class 'str'>, {'three': '1', 'two': '3'}), ? ^^^^ ^ - 'deque': deque([1, 2], maxlen=3), ? ^ + 'deque': deque([1, 3], maxlen=3), ? ^ - 'dict': {'one': 1, 'two': 2}, ? ^ + 'dict': {'one': 1, 'two': 3}, ? ^ - 'list': [1, 2], + 'list': [1, 2, 3], ? +++ - 'set': {1, 2}, ? ^ + 'set': {1, 3}, ? ^ - 'simplenamespace': namespace(one=1, two=2), + 'simplenamespace': namespace(one=1, two=2, three=3), ? +++++++++ - 'tuple': (1, 2), ? -- + 'tuple': (1,), }	{ 'counter': Counter({ - '2': 2, ? ^ + '1': 2, ? ^ - '1': 1, ? ^ + '2': 1, ? ^ }), 'dataclass': DataclassWithTwoItems( foo='foo', - bar='bar', ? ^ + bar='baz', ? ^ ), 'defaultdict': defaultdict(<class 'str'>, { - 'one': '1', ? ^^ + 'three': '1', ? ^^^^ - 'two': '2', ? ^ + 'two': '3', ? ^ }), 'deque': deque(maxlen=3, [ 1, - 2, ? ^ + 3, ? ^ ]), 'dict': { 'one': 1, - 'two': 2, ? ^ + 'two': 3, ? ^ }, 'list': [ 1, 2, + 3, ], 'set': { 1, - 2, ? ^ + 3, ? ^ }, 'simplenamespace': namespace( one=1, two=2, + three=3, ), 'tuple': ( 1, - 2, ), }

Full example

Taking the example from https://github.com/lukaszb/pytest-dictsdiff, as in the issue:

Previously:

- {
-  'cell': '(056)-022-8631',
-  'dob': {'age': 34,
+ OrderedDict([('cell',
+               '(056)-022-8631'),
+              ('dob',
+               {'age': 44,
-          'date': '1953-11-04T01:21:04Z'},
?                     ^              ^
+                'date': '1983-11-04T01:21:14Z'}),
? ++++++                    ^              ^    +
-  'email': '[email protected]',
-  'gender': 'female',
-  'id': {'name': 'BSN',
+              ('email',
+               '[email protected]'),
+              ('gender',
+               'female'),
+              ('id',
+               {'name': 'BSN',
-         'value': '36180866'},
+                'value': '36180866'}),
? +++++++                            +
-  'location': {'city': 'Tholen',
+              ('location',
+               {'city': 'tholen',
-               'coordinates': {'latitude': '46.8823',
+                'coordinates': {'latitude': '46.8823',
? +
-                               'longitude': '175.8856'},
+                                'longitude': '175.8856'},
? +
-               'postcode': 64509,
?                               ^
+                'postcode': 64504,
? +                              ^
-               'state': 'groningen',
+                'state': 'groningen',
? +
-               'street': '2074 adriaen van ostadelaan',
+                'street': '2074 adriaen van ostadelaan',
? +
-               'timezone': {'description': 'Adelaide, Darwin',
+                'timezone': {'description': 'Adelaide, Darwin',
? +
-                            'offset': '+9:30'}},
+                             'offset': '+9:30'}}),
? +                                              +
+              ('login',
-  'login': {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^
+               {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^^^^^
-            'password': 'ontario',
+                'password': 'ontario',
? ++++
-            'salt': 'QVBKgEjy',
+                'salt': 'QVBKgEjy',
? ++++
-            'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
+                'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
? ++++
-            'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
+                'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
? ++++
-            'username': 'smallgorilla897',
+                'username': 'smallgorilla897',
? ++++
-            'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'},
+                'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'}),
? ++++                                                          +
-  'name': {'first': 'Zeyneb',
+              ('name',
+               {'first': 'zeyneb',
-           'last': 'Elfring',
?                    ^
+                'last': 'elfring',
? +++++                   ^
-           'title': 'mrs'},
+                'title': 'mrs'}),
? +++++                         +
-  'nat': 'NL',
-  'phone': '(209)-143-9697',
+              ('nat',
+               'NL'),
+              ('phone',
+               '(209)-143-9697'),
+              ('picture',
-  'picture': {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^
+               {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^^^
-              'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
+                'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
? ++
-              'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'},
+                'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'}),
? ++                                                                                   +
-  'registered': {'age': 3,
+              ('registered',
+               {'age': 3,
-                 'date': '2014-12-07T06:54:14Z'},
? -
+                'date': '2014-12-07T06:54:14Z'})],
?                                               ++
- }

Now:

- {
+ OrderedDict({
      'cell': '(056)-022-8631',
      'dob': {
-         'age': 34,
?                ^
+         'age': 44,
?                ^
-         'date': '1953-11-04T01:21:04Z',
?                    ^              ^
+         'date': '1983-11-04T01:21:14Z',
?                    ^              ^
      },
      'email': '[email protected]',
      'gender': 'female',
      'id': {
          'name': 'BSN',
          'value': '36180866',
      },
      'location': {
-         'city': 'Tholen',
?                  ^
+         'city': 'tholen',
?                  ^
          'coordinates': {
              'latitude': '46.8823',
              'longitude': '175.8856',
          },
-         'postcode': 64509,
?                         ^
+         'postcode': 64504,
?                         ^
          'state': 'groningen',
          'street': '2074 adriaen van ostadelaan',
          'timezone': {
              'description': 'Adelaide, Darwin',
              'offset': '+9:30',
          },
      },
      'login': {
          'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
          'password': 'ontario',
          'salt': 'QVBKgEjy',
          'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
          'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
          'username': 'smallgorilla897',
          'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
      },
      'name': {
-         'first': 'Zeyneb',
?                   ^
+         'first': 'zeyneb',
?                   ^
-         'last': 'Elfring',
?                  ^
+         'last': 'elfring',
?                  ^
          'title': 'mrs',
      },
      'nat': 'NL',
      'phone': '(209)-143-9697',
      'picture': {
          'large': 'https://randomuser.me/api/portraits/women/37.jpg',
          'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
          'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
      },
      'registered': {
          'age': 3,
          'date': '2014-12-07T06:54:14Z',
      },
- }
+ })

The normal default pretty printer is not great when objects are nested and it can get hard to read the diff. Instead, provide a pretty printer that behaves more like when json get indented, which allows for smaller, more meaningful differences, at the expense of a slightly longer diff

bluetech

Thanks @BenjaminSchubert, it'd be great to improve the diffs.

The pprint code is a bit hard to get into, before I/we do it, I think it would be helpful to show before/after examples on some common cases that are changed, both short and long.

Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest. My copy has 671 lines so it's not huge.

Advantages:

No risk of breakage due to using private stuff
Consistency between all Python versions
Can backport pprint functionality from new Python versions
Maybe can simplify/customize the code more directly?

Disadvantages:

Maintenance overhead
Missing on upstream improvements (unless we continuously backport)

BenjaminSchubert · 2023-10-30T09:14:20Z

@bluetech Good questions!

The pprint code is a bit hard to get into, before I/we do it, I think it would be helpful to show before/after examples on some common cases that are changed, both short and long.

I have now updated the description with a (collapsed) section to show some small examples for better representation. It is quite extensive to the point of becoming hard to read, but let me know if there is a need for more.

Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest. My copy has 671 lines so it's not huge.

That works for me, do you prefer me to copy the code and continue extending it as I did now, or do the modifiications directly there to simplify? And is another PR/branch easier for comparison, or are you happy with me just updating this one?

bluetech · 2023-10-30T14:42:58Z

Thank you @BenjaminSchubert, you examples are exactly what I hoped for.

I think it would be very good to get wider feedback on this, maybe even asking some of our users on the internet :) @nicoddemus @RonnyPfannschmidt @The-Compiler WDYT? See @BenjaminSchubert many before/after examples in the PR description (make sure to expand).

That works for me, do you prefer me to copy the code and continue extending it as I did now, or do the modifiications directly there to simplify? And is another PR/branch easier for comparison, or are you happy with me just updating this one?

A separate PR against current main would be great; it will be useful regardless and hopefully (maybe?) make the changes here easier to review.

The-Compiler · 2023-10-30T14:51:52Z

I've only looked at the output changes, but I like those a lot! Seems to make the output much more readable IMHO, and thanks to the ^ it's still easy to see where the differences are.

nicoddemus · 2023-10-30T16:12:18Z

I agree the outcome is great!

Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest.

I agree copying it is probably the way to go. 👍

My suggestion is to add it to the repository without any changes in its own commit, and later on add the necessary changes in separate commits: having a single commit with the clean copy helps us to later see exactly what has changed.

BenjaminSchubert · 2023-10-30T22:51:31Z

@bluetech #11571 would be the implementation with vendoring

BenjaminSchubert · 2023-11-07T20:25:11Z

Hey, just want to make sure if I was meant to do something more still before reviews or not (I don't need a review right now, just want to make sure you are not waiting on me :) )

nicoddemus · 2023-11-07T21:39:40Z

@BenjaminSchubert don't worry, you do not need to do anything else, it is on our plate now, thanks!

BenjaminSchubert · 2023-11-18T09:36:06Z

Closing in favor of #11626 and #11571

BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch from 9e71d6a to 7d4eec8 Compare October 22, 2023 12:59

BenjaminSchubert mentioned this pull request Oct 22, 2023

Better diff for asserts on dicts #1531

Closed

BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch 2 times, most recently from 6eeef45 to 6ab988c Compare October 24, 2023 19:04

BenjaminSchubert added 3 commits October 24, 2023 21:13

fixup! Always use, and improve the AlwaysDispatchPrettyPrinter

66f18bf

Add changelog entry

ce16d20

BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch from 6ab988c to ce16d20 Compare October 24, 2023 20:14

bluetech reviewed Oct 25, 2023

View reviewed changes

BenjaminSchubert mentioned this pull request Oct 30, 2023

Improve the full diff by having more consistent indentation in the PrettyPrinter #11571

Merged

BenjaminSchubert closed this Nov 18, 2023

BenjaminSchubert deleted the bschubert/nicer-comparisons branch November 27, 2023 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

BenjaminSchubert commented Oct 22, 2023 •

edited

Loading

bluetech left a comment

BenjaminSchubert commented Oct 30, 2023

bluetech commented Oct 30, 2023 •

edited

Loading

The-Compiler commented Oct 30, 2023

nicoddemus commented Oct 30, 2023 •

edited

Loading

BenjaminSchubert commented Oct 30, 2023

BenjaminSchubert commented Nov 7, 2023

nicoddemus commented Nov 7, 2023

BenjaminSchubert commented Nov 18, 2023

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

Conversation

BenjaminSchubert commented Oct 22, 2023 • edited Loading

Overview

Alternatives/Potential improvements

Notes for maintainers

Examples

Basic

Full example

Previously:

Now:

bluetech left a comment

Choose a reason for hiding this comment

BenjaminSchubert commented Oct 30, 2023

bluetech commented Oct 30, 2023 • edited Loading

The-Compiler commented Oct 30, 2023

nicoddemus commented Oct 30, 2023 • edited Loading

BenjaminSchubert commented Oct 30, 2023

BenjaminSchubert commented Nov 7, 2023

nicoddemus commented Nov 7, 2023

BenjaminSchubert commented Nov 18, 2023

BenjaminSchubert commented Oct 22, 2023 •

edited

Loading

bluetech commented Oct 30, 2023 •

edited

Loading

nicoddemus commented Oct 30, 2023 •

edited

Loading