Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use, and improve the AlwaysDispatchPrettyPrinter for diffs #11537

Conversation

BenjaminSchubert
Copy link
Contributor

@BenjaminSchubert BenjaminSchubert commented Oct 22, 2023

Overview

Note: An alternative implementation has been made at #11571, following feedback.

The normal default pretty printer is not great when objects are nested it can get hard to read the diffs produced.

Instead, provide a pretty printer that behaves more like when json get which allows for smaller, more meaningful differences, at the expense of a slightly longer diff.

Fix #1531

Alternatives/Potential improvements

This has the disadvantage that diffs are now longer, even for small changes (like [1, 2] == [1, 3]). We could potentially still keep the previous implementation for the case where the diff is the same length AND has a single line. This would take care of trivial cases. It would however make some diffs harder to read again, like [1, 2, 3] == [2, 3, 4], which would now show 3 differences.

This is however not generalisable to deeply nested payloads.

Notes for maintainers

This needs to work a lot with private details of the pretty printer. There was already some of that happening, but this makes it much more tied to it. It would technically be possible to reimplement most of it without touching private methods, but that would require a lot of copying. this module has been quite stable over time so it might not be worth it.

The first commit contains the initial 'crude' implementation, which mostly copy pastes and adapts the code from the standard librarie's prettyprinter. The second commit (the !fixup) cleans up the interface and removes duplication.

The support for dataclasses and simplenamespaces is degraded before python 3.10 and 3.9 respectively. I believe we could backport the implementation, but not sure how important that is.

Examples

Basic

Generated using the following script
from collections import Counter, defaultdict, deque
from dataclasses import dataclass
from functools import partial
import pprint
from types import SimpleNamespace
from typing import Optional, Dict, Any, IO, List
import difflib

from _pytest._io.saferepr import _pformat_dispatch

###
# Original pytest diff, copied in
###
class AlwaysDispatchingPrettyPrinter(pprint.PrettyPrinter):
  """PrettyPrinter that always dispatches (regardless of width)."""

  def _format(
      self,
      object: object,
      stream: IO[str],
      indent: int,
      allowance: int,
      context: Dict[int, Any],
      level: int,
  ) -> None:
      # Type ignored because _dispatch is private.
      p = self._dispatch.get(type(object).__repr__, None)  # type: ignore[attr-defined]

      objid = id(object)
      if objid in context or p is None:
          # Type ignored because _format is private.
          super()._format(  # type: ignore[misc]
              object,
              stream,
              indent,
              allowance,
              context,
              level,
          )
          return

      context[objid] = 1
      p(self, object, stream, indent, allowance, context, level + 1)
      del context[objid]


def _pformat_dispatch_original(
  object: object,
  indent: int = 1,
  width: int = 80,
  depth: Optional[int] = None,
  *,
  compact: bool = False,
) -> str:
  return AlwaysDispatchingPrettyPrinter(
      indent=indent, width=width, depth=depth, compact=compact
  ).pformat(object)

def _surrounding_parens_on_own_lines(lines: List[str]) -> None:
  """Move opening/closing parenthesis/bracket to own lines."""
  opening = lines[0][:1]
  if opening in ["(", "[", "{"]:
      lines[0] = " " + lines[0][1:]
      lines[:] = [opening] + lines
  closing = lines[-1][-1:]
  if closing in [")", "]", "}"]:
      lines[-1] = lines[-1][:-1] + ","
      lines[:] = lines + [closing]


def original_diff(left, right):
  left_formatting = pprint.pformat(left).splitlines()
  right_formatting = pprint.pformat(right).splitlines()

  # Re-format for different output lengths.
  lines_left = len(left_formatting)
  lines_right = len(right_formatting)
  if lines_left != lines_right:
      left_formatting = _pformat_dispatch_original(left).splitlines()
      right_formatting = _pformat_dispatch_original(right).splitlines()

  if lines_left > 1 or lines_right > 1:
      _surrounding_parens_on_own_lines(left_formatting)
      _surrounding_parens_on_own_lines(right_formatting)

  return left_formatting, right_formatting


###
# Script to generate the diffs
###

TABLE = """
<table>
<tr>
<th>Test</th>
<th>Main</th>
<th>Proposal</th>
</tr>
{rows}
</table>
"""

ROW = """
<tr>
<td colspan=2>

\`\`\`python
{python}
\`\`\`
</td>
</tr>
<tr>
<td>

\`\`\`diff
{diff_original}
\`\`\`
</td>

<td>

\`\`\`diff
{diff_new}
\`\`\`
</td>
</tr>
"""


def get_row(left, right):
  original = "\n".join(
      line.rstrip()
      for line in difflib.ndiff(*original_diff(left, right))
  )
  new = "\n".join(
      line.rstrip()
      for line in difflib.ndiff(_pformat_dispatch(left).splitlines(), _pformat_dispatch(right).splitlines())
  )

  fmt = partial(pprint.pformat, indent=2, width=60)
  return f"{fmt(left)} \\ \n == {fmt(right)}", original, new


@dataclass
class DataclassWithTwoItems:
  foo: str
  bar: str


rows = [
  get_row(left, right)
  for left, right in [
      [{"one": 1, "two": 2}, {"three": 1, "two": 3}],
      [[1, 2], [1, 3]],
      [(1,), (2,)],
      [(1, 2), (1, 3)],
      [{1, 2}, {1, 3}],
      [SimpleNamespace(one=1, two=2), SimpleNamespace(one=2, three=2)],
      [
          defaultdict(str, {"one": "1", "two": "2"}),
          defaultdict(str, {"one": "1", "two": "3"}),
      ],
      [Counter("121"), Counter("122")],
      [deque([1, 2]), deque([1, 3])],
      [deque([1, 2], maxlen=3), deque([1, 3], maxlen=4)],
      [
          {
              "counter": Counter("122"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="bar"),
              "defaultdict": defaultdict(str, {"one": "1", "two": "2"}),
              "deque": deque([1, 2], maxlen=3),
              "dict": {"one": 1, "two": 2},
              "list": [1, 2],
              "set": {1, 2},
              "simplenamespace": SimpleNamespace(one=1, two=2),
              "tuple": (1, 2),
          },
          {
              "counter": Counter("121"),
              "dataclass": DataclassWithTwoItems(foo="foo", bar="baz"),
              "defaultdict": defaultdict(str, {"three": "1", "two": "3"}),
              "deque": deque([1, 3], maxlen=3),
              "dict": {"one": 1, "two": 3},
              "list": [1, 2, 3],
              "set": {1, 3},
              "simplenamespace": SimpleNamespace(one=1, two=2, three=3),
              "tuple": (1,),
          },
      ],
  ]
]

print(
  TABLE.format(
      rows="\n".join(
          [
              ROW.format(python=row[0], diff_original=row[1], diff_new=row[2])
              for row in rows
          ]
      )
  )
)
We get the following differences on small entries:
Test Main Proposal
{'one': 1, 'two': 2} \ 
 == {'three': 1, 'two': 3}
- {'one': 1, 'two': 2}
?   ^^              ^
+ {'three': 1, 'two': 3}
?   ^^^^              ^
  {
-     'one': 1,
?      ^^
+     'three': 1,
?      ^^^^
-     'two': 2,
?            ^
+     'two': 3,
?            ^
  }
[1, 2] \ 
 == [1, 3]
- [1, 2]
?     ^
+ [1, 3]
?     ^
  [
      1,
-     2,
?     ^
+     3,
?     ^
  ]
(1,) \ 
 == (2,)
- (1,)
?  ^
+ (2,)
?  ^
  (
-     1,
?     ^
+     2,
?     ^
  )
(1, 2) \ 
 == (1, 3)
- (1, 2)
?     ^
+ (1, 3)
?     ^
  (
      1,
-     2,
?     ^
+     3,
?     ^
  )
{1, 2} \ 
 == {1, 3}
- {1, 2}
?     ^
+ {1, 3}
?     ^
  {
      1,
-     2,
?     ^
+     3,
?     ^
  }
namespace(one=1, two=2) \ 
 == namespace(one=2, three=2)
- namespace(one=1, two=2)
?               ^   ^^
+ namespace(one=2, three=2)
?               ^   ^^^^
  namespace(
-     one=1,
?         ^
+     one=2,
?         ^
-     two=2,
+     three=2,
  )
defaultdict(<class 'str'>, {'one': '1', 'two': '2'}) \ 
 == defaultdict(<class 'str'>, {'one': '1', 'two': '3'})
- defaultdict(<class 'str'>, {'one': '1', 'two': '2'})
?                                                 ^
+ defaultdict(<class 'str'>, {'one': '1', 'two': '3'})
?                                                 ^
  defaultdict(<class 'str'>, {
      'one': '1',
-     'two': '2',
?             ^
+     'two': '3',
?             ^
  })
Counter({'1': 2, '2': 1}) \ 
 == Counter({'2': 2, '1': 1})
- Counter({'1': 2, '2': 1})
+ Counter({'2': 2, '1': 1})
  Counter({
-     '1': 2,
?      ^
+     '2': 2,
?      ^
-     '2': 1,
?      ^
+     '1': 1,
?      ^
  })
deque([1, 2]) \ 
 == deque([1, 3])
- deque([1, 2])
?           ^
+ deque([1, 3])
?           ^
  deque([
      1,
-     2,
?     ^
+     3,
?     ^
  ])
deque([1, 2], maxlen=3) \ 
 == deque([1, 3], maxlen=4)
- deque([1, 2], maxlen=3)
?           ^          ^
+ deque([1, 3], maxlen=4)
?           ^          ^
- deque(maxlen=3, [
?              ^
+ deque(maxlen=4, [
?              ^
      1,
-     2,
?     ^
+     3,
?     ^
  ])
{ 'counter': Counter({'2': 2, '1': 1}),
  'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'),
  'defaultdict': defaultdict(<class 'str'>,
                             { 'one': '1',
                               'two': '2'}),
  'deque': deque([1, 2], maxlen=3),
  'dict': {'one': 1, 'two': 2},
  'list': [1, 2],
  'set': {1, 2},
  'simplenamespace': namespace(one=1, two=2),
  'tuple': (1, 2)} \ 
 == { 'counter': Counter({'1': 2, '2': 1}),
  'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'),
  'defaultdict': defaultdict(<class 'str'>,
                             { 'three': '1',
                               'two': '3'}),
  'deque': deque([1, 3], maxlen=3),
  'dict': {'one': 1, 'two': 3},
  'list': [1, 2, 3],
  'set': {1, 3},
  'simplenamespace': namespace(one=1, two=2, three=3),
  'tuple': (1,)}
  {
-  'counter': Counter({'2': 2, '1': 1}),
?                          --------
+  'counter': Counter({'1': 2, '2': 1}),
?                       ++++++++
-  'dataclass': DataclassWithTwoItems(foo='foo', bar='bar'),
?                                                       ^
+  'dataclass': DataclassWithTwoItems(foo='foo', bar='baz'),
?                                                       ^
-  'defaultdict': defaultdict(<class 'str'>, {'one': '1', 'two': '2'}),
?                                              ^^                 ^
+  'defaultdict': defaultdict(<class 'str'>, {'three': '1', 'two': '3'}),
?                                              ^^^^                 ^
-  'deque': deque([1, 2], maxlen=3),
?                     ^
+  'deque': deque([1, 3], maxlen=3),
?                     ^
-  'dict': {'one': 1, 'two': 2},
?                            ^
+  'dict': {'one': 1, 'two': 3},
?                            ^
-  'list': [1, 2],
+  'list': [1, 2, 3],
?               +++
-  'set': {1, 2},
?             ^
+  'set': {1, 3},
?             ^
-  'simplenamespace': namespace(one=1, two=2),
+  'simplenamespace': namespace(one=1, two=2, three=3),
?                                           +++++++++
-  'tuple': (1, 2),
?              --
+  'tuple': (1,),
  }
  {
      'counter': Counter({
-         '2': 2,
?          ^
+         '1': 2,
?          ^
-         '1': 1,
?          ^
+         '2': 1,
?          ^
      }),
      'dataclass': DataclassWithTwoItems(
          foo='foo',
-         bar='bar',
?                ^
+         bar='baz',
?                ^
      ),
      'defaultdict': defaultdict(<class 'str'>, {
-         'one': '1',
?          ^^
+         'three': '1',
?          ^^^^
-         'two': '2',
?                 ^
+         'two': '3',
?                 ^
      }),
      'deque': deque(maxlen=3, [
          1,
-         2,
?         ^
+         3,
?         ^
      ]),
      'dict': {
          'one': 1,
-         'two': 2,
?                ^
+         'two': 3,
?                ^
      },
      'list': [
          1,
          2,
+         3,
      ],
      'set': {
          1,
-         2,
?         ^
+         3,
?         ^
      },
      'simplenamespace': namespace(
          one=1,
          two=2,
+         three=3,
      ),
      'tuple': (
          1,
-         2,
      ),
  }

Full example

Taking the example from https://github.com/lukaszb/pytest-dictsdiff, as in the issue:

Previously:

- {
-  'cell': '(056)-022-8631',
-  'dob': {'age': 34,
+ OrderedDict([('cell',
+               '(056)-022-8631'),
+              ('dob',
+               {'age': 44,
-          'date': '1953-11-04T01:21:04Z'},
?                     ^              ^
+                'date': '1983-11-04T01:21:14Z'}),
? ++++++                    ^              ^    +
-  'email': '[email protected]',
-  'gender': 'female',
-  'id': {'name': 'BSN',
+              ('email',
+               '[email protected]'),
+              ('gender',
+               'female'),
+              ('id',
+               {'name': 'BSN',
-         'value': '36180866'},
+                'value': '36180866'}),
? +++++++                            +
-  'location': {'city': 'Tholen',
+              ('location',
+               {'city': 'tholen',
-               'coordinates': {'latitude': '46.8823',
+                'coordinates': {'latitude': '46.8823',
? +
-                               'longitude': '175.8856'},
+                                'longitude': '175.8856'},
? +
-               'postcode': 64509,
?                               ^
+                'postcode': 64504,
? +                              ^
-               'state': 'groningen',
+                'state': 'groningen',
? +
-               'street': '2074 adriaen van ostadelaan',
+                'street': '2074 adriaen van ostadelaan',
? +
-               'timezone': {'description': 'Adelaide, Darwin',
+                'timezone': {'description': 'Adelaide, Darwin',
? +
-                            'offset': '+9:30'}},
+                             'offset': '+9:30'}}),
? +                                              +
+              ('login',
-  'login': {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^
+               {'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
?  ^^^^^^^^^^^^
-            'password': 'ontario',
+                'password': 'ontario',
? ++++
-            'salt': 'QVBKgEjy',
+                'salt': 'QVBKgEjy',
? ++++
-            'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
+                'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
? ++++
-            'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
+                'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
? ++++
-            'username': 'smallgorilla897',
+                'username': 'smallgorilla897',
? ++++
-            'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'},
+                'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa'}),
? ++++                                                          +
-  'name': {'first': 'Zeyneb',
+              ('name',
+               {'first': 'zeyneb',
-           'last': 'Elfring',
?                    ^
+                'last': 'elfring',
? +++++                   ^
-           'title': 'mrs'},
+                'title': 'mrs'}),
? +++++                         +
-  'nat': 'NL',
-  'phone': '(209)-143-9697',
+              ('nat',
+               'NL'),
+              ('phone',
+               '(209)-143-9697'),
+              ('picture',
-  'picture': {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^
+               {'large': 'https://randomuser.me/api/portraits/women/37.jpg',
?  ^^^^^^^^^^^^
-              'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
+                'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
? ++
-              'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'},
+                'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg'}),
? ++                                                                                   +
-  'registered': {'age': 3,
+              ('registered',
+               {'age': 3,
-                 'date': '2014-12-07T06:54:14Z'},
? -
+                'date': '2014-12-07T06:54:14Z'})],
?                                               ++
- }

Now:

- {
+ OrderedDict({
      'cell': '(056)-022-8631',
      'dob': {
-         'age': 34,
?                ^
+         'age': 44,
?                ^
-         'date': '1953-11-04T01:21:04Z',
?                    ^              ^
+         'date': '1983-11-04T01:21:14Z',
?                    ^              ^
      },
      'email': '[email protected]',
      'gender': 'female',
      'id': {
          'name': 'BSN',
          'value': '36180866',
      },
      'location': {
-         'city': 'Tholen',
?                  ^
+         'city': 'tholen',
?                  ^
          'coordinates': {
              'latitude': '46.8823',
              'longitude': '175.8856',
          },
-         'postcode': 64509,
?                         ^
+         'postcode': 64504,
?                         ^
          'state': 'groningen',
          'street': '2074 adriaen van ostadelaan',
          'timezone': {
              'description': 'Adelaide, Darwin',
              'offset': '+9:30',
          },
      },
      'login': {
          'md5': 'bafe8cf9d37806a7b13edc218d5ff762',
          'password': 'ontario',
          'salt': 'QVBKgEjy',
          'sha1': 'cacef09ff61072d1c55732963766fa84e919aa7a',
          'sha256': 'cc86af47aedbdbb1de73ff10484996fe9785c47c0fc191b7c67eaf71e0782300',
          'username': 'smallgorilla897',
          'uuid': '37e30c59-bc79-4172-aac6-e2c640e165fa',
      },
      'name': {
-         'first': 'Zeyneb',
?                   ^
+         'first': 'zeyneb',
?                   ^
-         'last': 'Elfring',
?                  ^
+         'last': 'elfring',
?                  ^
          'title': 'mrs',
      },
      'nat': 'NL',
      'phone': '(209)-143-9697',
      'picture': {
          'large': 'https://randomuser.me/api/portraits/women/37.jpg',
          'medium': 'https://randomuser.me/api/portraits/med/women/37.jpg',
          'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/37.jpg',
      },
      'registered': {
          'age': 3,
          'date': '2014-12-07T06:54:14Z',
      },
- }
+ })

@BenjaminSchubert BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch from 9e71d6a to 7d4eec8 Compare October 22, 2023 12:59
@BenjaminSchubert BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch 2 times, most recently from 6eeef45 to 6ab988c Compare October 24, 2023 19:04
The normal default pretty printer is not great when objects are nested
and it can get hard to read the diff.

Instead, provide a pretty printer that behaves more like when json get
indented, which allows for smaller, more meaningful differences, at
the expense of a slightly longer diff
@BenjaminSchubert BenjaminSchubert force-pushed the bschubert/nicer-comparisons branch from 6ab988c to ce16d20 Compare October 24, 2023 20:14
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenjaminSchubert, it'd be great to improve the diffs.

The pprint code is a bit hard to get into, before I/we do it, I think it would be helpful to show before/after examples on some common cases that are changed, both short and long.


Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest. My copy has 671 lines so it's not huge.

Advantages:

  • No risk of breakage due to using private stuff
  • Consistency between all Python versions
  • Can backport pprint functionality from new Python versions
  • Maybe can simplify/customize the code more directly?

Disadvantages:

  • Maintenance overhead
  • Missing on upstream improvements (unless we continuously backport)

@BenjaminSchubert
Copy link
Contributor Author

@bluetech Good questions!

The pprint code is a bit hard to get into, before I/we do it, I think it would be helpful to show before/after examples on some common cases that are changed, both short and long.

I have now updated the description with a (collapsed) section to show some small examples for better representation. It is quite extensive to the point of becoming hard to read, but let me know if there is a need for more.

Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest. My copy has 671 lines so it's not huge.

That works for me, do you prefer me to copy the code and continue extending it as I did now, or do the modifiications directly there to simplify? And is another PR/branch easier for comparison, or are you happy with me just updating this one?

@bluetech
Copy link
Member

bluetech commented Oct 30, 2023

Thank you @BenjaminSchubert, you examples are exactly what I hoped for.

I think it would be very good to get wider feedback on this, maybe even asking some of our users on the internet :) @nicoddemus @RonnyPfannschmidt @The-Compiler WDYT? See @BenjaminSchubert many before/after examples in the PR description (make sure to expand).

That works for me, do you prefer me to copy the code and continue extending it as I did now, or do the modifiications directly there to simplify? And is another PR/branch easier for comparison, or are you happy with me just updating this one?

A separate PR against current main would be great; it will be useful regardless and hopefully (maybe?) make the changes here easier to review.

@The-Compiler
Copy link
Member

I've only looked at the output changes, but I like those a lot! Seems to make the output much more readable IMHO, and thanks to the ^ it's still easy to see where the differences are.

@nicoddemus
Copy link
Member

nicoddemus commented Oct 30, 2023

I agree the outcome is great!

Regarding the private implementation details, I wonder if we will now be changing enough things such that not much is left from the upstream PrettyPrinter? In that case, we can just copy pprint.py wholesale to pytest.

I agree copying it is probably the way to go. 👍

My suggestion is to add it to the repository without any changes in its own commit, and later on add the necessary changes in separate commits: having a single commit with the clean copy helps us to later see exactly what has changed.

@BenjaminSchubert
Copy link
Contributor Author

@bluetech #11571 would be the implementation with vendoring

@BenjaminSchubert
Copy link
Contributor Author

Hey, just want to make sure if I was meant to do something more still before reviews or not (I don't need a review right now, just want to make sure you are not waiting on me :) )

@nicoddemus
Copy link
Member

@BenjaminSchubert don't worry, you do not need to do anything else, it is on our plate now, thanks!

@BenjaminSchubert
Copy link
Contributor Author

Closing in favor of #11626 and #11571

@BenjaminSchubert BenjaminSchubert deleted the bschubert/nicer-comparisons branch November 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better diff for asserts on dicts
4 participants