Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color the full diff that pytest shows as a diff #11530

Merged
merged 8 commits into from
Oct 24, 2023

Conversation

BenjaminSchubert
Copy link
Contributor

Overview

Previously, it would get printed all in red, which makes it hard to read and actually understand.

However, the diffs shown are standard and have a supported lexer in Pygments. As such, use this to color the output when pygments is available.

This is a step towards #11520 but does not address all the discussion points yet.

Visual changes

Previously

image

With this change

image

Questions

I am unsure about two things here:

  • How we expose the lexers on the terminal writer. Having to pass a string and hoping it will do the right thing is not ideal, I believe it is slightly better than providing one when doing the call as, that way, only the terminalwriter has to handle whether pygments is available or not
  • The testing might be a bit clunky and prone to breaking. If needed we could reduce the strictness of checks, let me know what you prefer

Previously, it would get printed all in red, which makes it hard to read
and actually understand.

However, the diffs shown are standard and have a supported lexer in
Pygments. As such, use this to color the output when pygments is
available.
@nicoddemus
Copy link
Member

Thanks a lot @BenjaminSchubert, this looks great!

How we expose the lexers on the terminal writer. Having to pass a string and hoping it will do the right thing is not ideal, I believe it is slightly better than providing one when doing the call as, that way, only the terminalwriter has to handle whether pygments is available or not

Yeah this seems fine to me. 👍

The testing might be a bit clunky and prone to breaking. If needed we could reduce the strictness of checks, let me know what you prefer

Yeah, I agree. Perhaps we can do some really shallow testing there: we don't need to test the full diff coloring, we just need to ensure that diff coloring has been applied. So we can check that some coloring has been applied in a diff output in a normal run, and then no diff coloring in a 2nd run of the same code but with --colors=no.

@BenjaminSchubert
Copy link
Contributor Author

@nicoddemus thanks for the quick feedback :)

Yeah, I agree. Perhaps we can do some really shallow testing there: we don't need to test the full diff coloring, we just need to ensure that diff coloring has been applied. So we can check that some coloring has been applied in a diff output in a normal run, and then no diff coloring in a 2nd run of the same code but with --colors=no.

I have pushed a fixup to reduce the amount of checks, which should make it easier to maintain on the long run. I can still trim it down more by just matchin we have a green +, but I think as of now it looks potentially better?

I just need to figure out why the CI is failing in some cases only now, which I am currently unable to reproduce locally

@nicoddemus
Copy link
Member

I have pushed a fixup to reduce the amount of checks, which should make it easier to maintain on the long run. I can still trim it down more by just matchin we have a green +, but I think as of now it looks potentially better?

I was thinking of testing a single sample code, but I think this is better now and good to go now. 👍

I just need to figure out why the CI is failing in some cases only now, which I am currently unable to reproduce locally

Yeah me neither. I noticed however the failures happen in runs using xdist, but I also was unable to reproduce locally (using xdist too).

@BenjaminSchubert
Copy link
Contributor Author

Yeah me neither. I noticed however the failures happen in runs using xdist, but I also was unable to reproduce locally (using xdist too).

No worries, I'll investigate more in depth, it does seem triggered by my change, as a run of the main branch does indeed pass

@nicoddemus
Copy link
Member

nicoddemus commented Oct 21, 2023

It is really a head scratcher.

Because this cannot be reproduced this locally it seems, to debug/investigate this I would push a series of commits:

  1. Reduce the build matrix to just run one of the breaking environments (for example py310-ubuntu) -- mostly to avoid running a ton of environments while we debug.
  2. Comment out the new test, to see if the test itself is somehow affecting the others.
  3. Start to slowly revert and commit each change, one by one (for example, start by reverting the changes in conftest.py).

Hopefully at some point this approach will point out to a small change that causes the other tests to break, and from there it might facilitate discovering what's going on.

@BenjaminSchubert
Copy link
Contributor Author

@nicoddemus That's a weird one. 7cbe6ef is enough to fix it. I am unsure why it breaks otherwise but don't think it's worth investigating more?

Let me know if you want me to squash the commits or if anything else needs updating :)

@BenjaminSchubert
Copy link
Contributor Author

Apologies for the noise, this was not it

@nicoddemus
Copy link
Member

Apologies for the noise, this was not it

Oh OK, thanks for the investigation!

Let me know if you want me to squash the commits or if anything else needs updating :)

Don't worry about squashing, we can do so before merging. 👍

@BenjaminSchubert
Copy link
Contributor Author

BenjaminSchubert commented Oct 21, 2023

I have a reproduction for the bug... I have no idea what's going on there.

main...BenjaminSchubert:pytest:test-me-bug-repro has most cases of python<3.11 failing (except some??) and barely does any changes.

The changes to the test.yml are just to get the tests to run from a fork, so the only change is actually main...BenjaminSchubert:pytest:test-me-bug-repro#diff-e37001cd4b585aab5d9ddae3a0b6c2658adb2f096545acd8b50b778d12beb5ca

Which is basically:

import pytest


# To fail you need at least:
#   - 2 parametrize, with at least 2 value each. Content of the value does not
#     seem to matter
#   - use pytester
@pytest.mark.parametrize("foo", [True, False])
@pytest.mark.parametrize("bar", [True, False])
def test_bug(pytester, foo, bar) -> None:
    pass

And this is the check run: https://github.com/BenjaminSchubert/pytest/actions/runs/6597999568

I'll see if I can write the tests differently without them becoming non-sensical

@BenjaminSchubert
Copy link
Contributor Author

The only thing that all the failing tests have in common is xdist. It looks like running this test under xdist fails.

@BenjaminSchubert
Copy link
Contributor Author

I can reproduce locally. pytest -n2 on the whole codebase does trigger the bug. pytest -n{3,4,5,6,7,8,9,10,16} does not.

Running pytest -n2 testing/test_assertion.py testing/test_junitxml.py does not trigger it

@nicoddemus
Copy link
Member

nicoddemus commented Oct 21, 2023

Found the problematic test, this fails locally:

pytest testing/logging/test_fixture.py::test_change_level_logging_disabled testing/test_junitxml.py::TestPython::test_failure_function[xunit1-log]

This also fails on main, so unrelated to this PR.

I will work on another PR fixing this, and then we can rebase yours to fix this.

Thanks for the patience.

Note: I used https://github.com/esss/pytest-replay to get the list of tests running in a specific node (because of xdist), and then manually bisected the failure (I tried using https://github.com/asottile/detect-test-pollution but got an error).

Logging has many global states, and we did foresee this by creating a ``cleanup_disabled_logging`` fixture,
however one might still forget to use it and failures leak later -- sometimes not even in the same PR, because the order
of the tests might change in the future, specially when running under xdist.

This problem surfaced during pytest-dev#11530, where tests unrelated to the change started to fail.
nicoddemus added a commit to nicoddemus/pytest that referenced this pull request Oct 21, 2023
Logging has many global states, and we did foresee this by creating a ``cleanup_disabled_logging`` fixture,
however one might still forget to use it and failures leak later -- sometimes not even in the same PR, because the order
of the tests might change in the future, specially when running under xdist.

This problem surfaced during pytest-dev#11530, where tests unrelated to the change started to fail.
@BenjaminSchubert
Copy link
Contributor Author

@nicoddemus good catch, I was still very far from finding it :D Thank you very much

@nicoddemus
Copy link
Member

@nicoddemus good catch, I was still very far from finding it :D Thank you very much

Sure thing!

I opened #11531 also, we can rebase this branch once that gets merged.

Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change a lot, thanks @BenjaminSchubert. The code looks good to me. As you said there are a few more places that can benefit from this, but can do it in follow up PRs.

@@ -189,7 +190,8 @@ def assertrepr_compare(
explanation = None
try:
if op == "==":
explanation = _compare_eq_any(left, right, verbose)
writer = config.get_terminal_writer()
explanation = _compare_eq_any(left, right, writer, verbose)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we pass the full TerminalWriter to these functions, because it makes it seem like the functions actually write to the terminal, while they only need it for the pure _highlight function. But it's OK for now, not asking you to refactor :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer, we could instead pass a highlight function (Callable[[str], str]) which would be TerminalWriter._highlight

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have pushed a fixup for this. I wasn't sure where to add the type definition, feel free to update/let me know if there's a better place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks great, thanks!

nicoddemus added a commit to nicoddemus/pytest that referenced this pull request Oct 23, 2023
Logging has many global states, and we did foresee this by creating a ``cleanup_disabled_logging`` fixture,
however one might still forget to use it and failures leak later -- sometimes not even in the same PR, because the order
of the tests might change in the future, specially when running under xdist.

This problem surfaced during pytest-dev#11530, where tests unrelated to the change started to fail.
@nicoddemus nicoddemus merged commit fbe3e29 into pytest-dev:main Oct 24, 2023
21 of 22 checks passed
@BenjaminSchubert BenjaminSchubert deleted the bschubert/colored-diffs branch October 24, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants