Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipped flaky tests #300

Open
asmeurer opened this issue Nov 14, 2024 · 7 comments
Open

Skipped flaky tests #300

asmeurer opened this issue Nov 14, 2024 · 7 comments
Labels
high priority High-priority issue

Comments

@asmeurer
Copy link
Member

Several tests are completely skipped right now because they are "flaky".

  • test_reshape
  • test_std
  • test_var
  • test_remainder

This is a pretty high priority issue because these functions are effectively completely untested, even though they appear to be tested.

Tests should be written in such a way that they aren't flaky, for instance, by using high numerical tolerances (or if necessary, avoiding values testing entirely).

Note that health checks for timeouts should just be skipped, and health checks for filtering too much should be fixed by fixing the strategy.

@asmeurer asmeurer added the high priority High-priority issue label Nov 14, 2024
@asmeurer asmeurer mentioned this issue Nov 14, 2024
34 tasks
@ev-br
Copy link
Contributor

ev-br commented Nov 17, 2024

Looking at test_std, https://github.com/data-apis/array-api-tests/blob/master/array_api_tests/test_statistical_functions.py#L262, it does not seem to attempt any value testing. Then what is flaky, assert_dtype or assert_keepdimable_shape?

@asmeurer
Copy link
Member Author

I've no idea what's flaky with any of these. The first order of business would to remove that decorator and figure out why the test was failing. It's also possible that some of these were only flaky with certain libraries.

@asmeurer
Copy link
Member Author

asmeurer commented Nov 18, 2024

Also, it's possible the flakyness was fixed and the skip was never removed. It looks like skip for std was added in #233 (with no explanation) if you want to check previous versions.

At best, if the test seems to be passing, we can just remove the skip and see if any upstream failures are found. Like I mentioned in another issue, it's really easy to just revert changes here if they break stuff since we don't even have releases, so I wouldn't be too worried about that.

@ev-br
Copy link
Contributor

ev-br commented Nov 23, 2024

test_reshape is fixed in gh-319

@ev-br
Copy link
Contributor

ev-br commented Nov 25, 2024

Caught a test_std failure with array_api_compat.numpy:

array_api_tests/test_statistical_functions.py::test_std FAILED                   [100%]

    @given(
>       x=hh.arrays(
            dtype=hh.real_floating_dtypes,
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )

array_api_tests/test_statistical_functions.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def inconsistent_generation():
>       raise FlakyStrategyDefinition(
            "Inconsistent data generation! Data generation behaved differently "
            "between different runs. Is your data generation depending on external "
            "state?"
        )
E       hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?

../../miniforge3/envs/array-api-tests/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:52: FlakyStrategyDefinition
-------------------------------------- Hypothesis --------------------------------------
You can add @seed(146745493194750825545715057348996307346) to this test or run pytest with --hypothesis-seed=146745493194750825545715057348996307346 to reproduce this failure.
=================================== warnings summary ===================================
array_api_tests/test_statistical_functions.py: 59 warnings
  /home/br/miniforge3/envs/array-api-tests/lib/python3.11/site-packages/numpy/_core/_methods.py:227: RuntimeWarning: Degrees of freedom <= 0 for slice
    ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== short test summary info ================================
FAILED array_api_tests/test_statistical_functions.py::test_std - hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data gener...
===================== 1 failed, 7 deselected, 59 warnings in 3.14s =====================
(array-api-tests) br@gonzales:~/repos/array-api-tests$ 

@asmeurer
Copy link
Member Author

I can reproduce that with

ARRAY_API_TESTS_MODULE=array_api_compat.numpy pytest --disable-warnings array_api_tests/test_statistical_functions.py -k std -v --hypothesis-seed=146745493194750825545715057348996307346 --max-examples=10000

@asmeurer
Copy link
Member Author

I can't tell what is causing it. None of the strategies seem to be that unusual. The only thing I see that's a little different from the other tests is that the input array is filtered to have at least 2 elements, but that shouldn't be causing this error.

Unfortunately, hypothesis makes it quite hard to tell what's going on with this error. The only thing I can suggest would be to refactor the input strategies, e.g., to use shared instead of data.draw. Otherwise, we may want to report this upstream on the hypothesis repo, and see if the hypothesis devs can offer any advice. It may also just be a bug in hypothesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority High-priority issue
Projects
None yet
Development

No branches or pull requests

2 participants