Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable many tests for complex numbers #54441

Closed

Conversation

MichaelTiemannOSC
Copy link
Contributor

Add complex to test_numpy.py and add complex128 to test_numeric.py. Then fix the many small problems revealed by the test suite.

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@MichaelTiemannOSC
Copy link
Contributor Author

One of the reasons I made these changes was to get a better understanding of how, in tests/base/ops.py, the function _get_expected_exception should be written for handling the case of complex numbers. In this first draft of changes, I didn't really use it, warded off by the comment that "the self.obj_bar pattern isn't great in part because it can depend on op_name or dtypes, but we use it here for backward-compatibility". I could rewrite the changes so that BaseArithmenticOpsTests uses _get_expected_exception instead of skipping the tests, but that would also entail lots more work in terms of creating test cases that notice an exception being raised. I didn't want to do that work speculatively.

pandas/core/nanops.py Outdated Show resolved Hide resolved
@mroeschke mroeschke added Testing pandas testing functions or related to the test suite Complex Complex Numbers labels Aug 7, 2023
@@ -213,6 +213,12 @@ def test_reductions_2d_axis0(self, data, method, min_count):

kwargs = {}
if method in ["std", "var"]:
if data.dtype.kind == "c":
pytest.skip(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to work? If so could you mark this with request.node.add_marker(pytest.mark.xfail(reason=...))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, I've fixed the test for the complex case (check_dtype=False in call to assert_extension_array_equal). Will commit when power/Internet restored.

@@ -11,6 +11,8 @@
class BaseParsingTests(BaseExtensionTests):
@pytest.mark.parametrize("engine", ["c", "python"])
def test_EA_types(self, engine, data):
if data.dtype.kind == "c":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This led to lots of interface changes to get request everwhere it needed to be.

@@ -88,6 +88,11 @@ def test_fillna_limit_backfill(self, data_missing):
tm.assert_series_equal(result, expected)

def test_fillna_no_op_returns_copy(self, data):
if data.dtype.kind == "c":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]:
for arg in [obj, other]:
if isinstance(arg, complex):
pytest.skip(f"{type(arg).__name__} does not support {op_name}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -54,7 +56,7 @@ def _assert_attr_equal(attr: str, left, right, obj: str = "Attributes"):
orig_assert_attr_equal(attr, left, right, obj)


@pytest.fixture(params=["float", "object"])
@pytest.fixture(params=["complex", "float", "object"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea fleshing these out!

super().test_arith_series_with_scalar(data, all_arithmetic_operators)
opname = all_arithmetic_operators
df = pd.DataFrame({"A": data})
self.check_opname(df, opname, data[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from the status quo?

@@ -338,7 +338,8 @@ def test_setitem_slice_array(self, data):

def test_setitem_scalar_key_sequence_raise(self, data):
arr = data[:5].copy()
with pytest.raises(ValueError):
# complex128 data raises TypeError; other numeric types raise ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why? would it make sense to catch and re-raise as a ValueError for consistency?

if data.dtype.kind == "c":
pytest.skip(
f"no cython implementation of backfill(ndarray[{data.dtype.name}_t],"
f"ndarray[{data.dtype.name}_t], int64_t) in libs/algos.pxd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you must skip/xfail, the place to do it is in test_numpy, not in the base tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@MichaelTiemannOSC
Copy link
Contributor Author

I don't understand the mypy error messages (which I've never seen before).

@@ -517,7 +517,9 @@ def _cast_quantile_result(self, res_values: np.ndarray) -> np.ndarray:
# numpy-like methods

@classmethod
def _empty(cls, shape: Shape, dtype: ExtensionDtype) -> Self:
def _empty(
cls, shape: Shape, dtype: ExtensionDtype, fill_value: object = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding a fill_value argument everywhere is a desirable solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. What's tricky is that we have no good way to query SCALARS (defined elsewhere) as to what sort of Array they want. In the case of complex128, it's obviously an ndarray of complex numbers. In the case of Pint quantities, it needs to be an array of whatever types can hold the magnitude value. But there's no way to ask that question from within Pandas right now. And reading about other EA types, which could have multiple independent arrays actually backing whatever object type is facially presented to _empty is even more difficult.

That said, if the ExtensionDtype had an interface that could either return an obj_dtype (which would be the dtype of the underlying Array) or a callable (which could wrap whatever complexity it wants to into something that _empty could process), maybe that would be cleaner.

Should I take a crack at making ExtensionDtype a little more introspectable? A groovy side-effect would be the efficient support of int8 and float16 EA datatypes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would #53089 help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it nibbles around the same edges. I'll see what I can do. If there's an obvious git/github sequence that puts me in the right spot as far as "add this PR to that PR and let me hack" please tell me. Otherwise, I'll construct things manually (which might make merging harder later).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question, I've found this article on stacked PRs: https://zx77.medium.com/stacked-pull-requests-with-github-f407b5d371ea . Is this the right way to think about the git part of making these changes on the PR #53089?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a pull request to @jbrockmendel 's branch would be a good way to propose changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not figure out the easy way to do that (I'm the opposite of a git expert). If you can tell me the git command for that, I can adjust. I didn't need to modify anything they did (other than fix an assertion failure in boolean arrays that didn't like seeing "bool" types coming up from the ObjDtype).

n, dtype=object if immutable_ea else dtype # type: ignore[arg-type]
cls = None

if hasattr(cls, "_from_scalars"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that this hasattr check is redundant for EAs if we have _from_scalars in the base class. Will clean up next commit.

@MichaelTiemannOSC
Copy link
Contributor Author

This is not what I meant to do. GitHub fooled me again!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complex Complex Numbers Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants