DEPS: Test NEP 50 #55739

mroeschke · 2023-10-27T21:54:08Z

mroeschke · 2023-10-28T01:04:44Z

@seberg could use your insight to with this behavior

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: np._set_promotion_state("weak")

In [11]: lib.maybe_convert_objects(np.array([np.int64(0)], dtype=object))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[11], line 1
----> 1 lib.maybe_convert_objects(np.array([np.int64(0)], dtype=object))

File ~/pandas/_libs/lib.pyx:2601, in pandas._libs.lib.maybe_convert_objects()
   2599 
   2600                 if ((seen.uint_ and seen.sint_) or
-> 2601                         val > oUINT64_MAX or val < oINT64_MIN):
   2602                     seen.object_ = True
   2603                     break

OverflowError: Python int too large to convert to C long

IIRC oUINT64_MAX and oINT64_MIN are originally C Ints

pandas/pandas/_libs/util.pxd

Lines 10 to 14 in 984d755

    
           INT64_MIN, 
        
           UINT8_MAX, 
        
           UINT16_MAX, 
        
           UINT32_MAX, 
        
           UINT64_MAX,

being cast to Numpy ints.

pandas/pandas/_libs/lib.pyx

Lines 143 to 145 in 984d755

    
           object oINT64_MAX = <int64_t>INT64_MAX 
        
           object oINT64_MIN = <int64_t>INT64_MIN 
        
           object oUINT64_MAX = <uint64_t>UINT64_MAX

I am not exactly sure, but is val a numpy int64 being cast here?

seberg · 2023-10-28T07:12:58Z

@mroeschke the problem is promotion, normally promotion will convert the python int to the existing version in binary ops. That said, the NumPy dev branch has additional logic in place to allow comparisons to always succeed.

mroeschke · 2023-10-30T18:10:27Z

That said, the NumPy dev branch has additional logic in place to allow comparisons to always succeed.

OK great! This is definitely more convenient behavior

mroeschke · 2023-10-30T18:13:51Z

Also, if we have a dependency (pytables) that uses non-NEP 50 compatible behavior and we make pandas NEP 50 compatible, would you recommend using _set/get_promotion_state around the calls that rely on old promotion rules?

lithomas1 · 2023-11-06T17:09:23Z

OK, I went and dug into a couple tests and it looks like we might be failing some tests because of the new rules such as uint64 * int64 = float64.

I guess we could fix it by trying to cast to the Series/Index dtype inside arithmetic operations.

seberg · 2023-11-06T17:25:08Z

the new rules such as uint64 * int64 = float64

Just to note, that rule is not new (everyone hates it). What is new, though is that if you have have uint64 or int64 be a 0-D object, it was previously cast. Which is what you still get, but only if it is a Python integer.

lithomas1 · 2023-11-08T00:09:18Z

the new rules such as uint64 * int64 = float64

Just to note, that rule is not new (everyone hates it). What is new, though is that if you have have uint64 or int64 be a 0-D object, it was previously cast. Which is what you still get, but only if it is a Python integer.

That makes sense, the failing test I was looking at was doing something like

pd.Index([1,2,3], dtype="uint64") * np.int64(3)

This is kind of a funky case. Normally, I think it might be OK to just follow the NEP 50 changes (and change the tests), since I think users will also expect Index/Series arithmetic to follow numpy semantics. As Sebastian mentions above, you already get float64 if you multiply by something like ``np.array([3], dtype="int64")

The only issue here, though is that the Arrow backed Indexes will still do the old behavior of trying to safe cast, so we'd be diverging in terms of behavior there, and I'd really like to keep everything consistent.

Here's a little demonstration of the differences

>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([1,2,3], dtype="uint64") * np.int64(3)
Index([3.0, 6.0, 9.0], dtype='float64') # This makes the tests fail
>>> pd.Index([1,2,3], dtype="uint64").astype("uint64[pyarrow]") * np.int64(3)
Index([3, 6, 9], dtype='int64[pyarrow]') # Note how pyarrow preserves the int dtype
>>> pd.Index([1,2,3], dtype="uint64") * np.array([3], dtype="int64")
Index([3.0, 6.0, 9.0], dtype='float64') # This happens even in older numpys
>>> pd.Index([1,2,3], dtype="uint64").astype("uint64[pyarrow]") * np.array([3], dtype="int64")
# This doesn't work in pyarrow

@pandas-dev/pandas-core thoughts?

WillAyd · 2023-11-08T01:27:25Z

There are a lot of options here all with different merits and I'm not sure there will be a clear solution.

Are those pyarrow results documented behavior? Even the uint64 * int64 resulting in int64 is surprising to me. I would have expected to follow C type promotion rules and end with a uint64

lithomas1 · 2023-12-20T04:20:22Z

OK, added more tests. Looks like the one case (that I've caught), where my hack doesn't work is 0, I special cased that out separately.

pandas/_libs/tslibs/timedeltas.pyx

pandas/tests/series/test_constructors.py

jbrockmendel · 2023-12-20T16:51:39Z

pandas/tests/series/test_constructors.py

-        with pytest.raises(ValueError, match=msg):
+        if np_version_gt2:
+            msg = "The elements provided in the data cannot all be casted to the dtype"
+            err = OverflowError


should we be catching and re-raising?

Not sure if it's worth it. I like the new error from numpy better anyways.

Do we need to preserve backwards compatibility here?

I am OK with this - I am not sure I see the value in re-raising as a ValueError when the OverflowError is more descriptive of what went wrong to begin with

agreed OverflowError seems More Correct; my main opinion is that we should be consistent (no idea what the status quo is consistency-wise)

Yea that's a good point. I wouldn't mind generally not replacing OverflowError with ValueError; they are both part of the stdlib but the former is more descriptive.

I am unaware of the history behind why that pattern might have been implemented

jbrockmendel · 2023-12-20T16:52:15Z

pandas/tests/io/formats/test_to_html.py

                names=["CL0", "CL1"],
            ),
            "left",
            "multiindex_1",
        ),
        (
-            MultiIndex.from_tuples(list(zip(range(4), np.mod(range(4), 2)))),
+            MultiIndex.from_arrays([np.arange(4), np.mod(range(4), 2)]),


might need to pass dtype to arange otherwise this will be different on 32bit builds

i guess it wont make a difference in this case

WillAyd · 2023-12-21T01:51:36Z

pandas/core/dtypes/cast.py

+                # Special case 0, our trick will not work for it since
+                # np.min_scalar_type(-0) will still give something unsigned
+                right = left_dtype
+            elif right > 0 and not np.issubdtype(left_dtype, np.unsignedinteger):


I think this would be simpler if you just did:

elif right > 0 and right <= 2 ** (8 * left_dtype.itemsize - 1) - 1: # use signed dtype

would cut down on some of the branching

Hm, do you know a way to programatically convert from the unsigned dtype to the signed one?

I think that was why I went with my hack in the first place.

I think you could create the new type by doing something like f"i{left_dtype.itemsize}"

Nice! Updated the code.

phofl

lgtm

WillAyd

lgtm - nice work!

lithomas1 · 2023-12-22T00:34:58Z

Will put this in for the RC. Happy to address review comments in a follow up.

Thanks for the reviews everyone!

mroeschke · 2023-12-27T20:04:35Z

.github/workflows/unit-tests.yml

@@ -92,7 +92,7 @@ jobs:
          - name: "Numpy Dev"
            env_file: actions-311-numpydev.yaml
            pattern: "not slow and not network and not single_cpu"
-            test_args: "-W error::DeprecationWarning -W error::FutureWarning"


I think we need to still check for a DeprecationWarning here

mroeschke · 2023-12-27T20:06:33Z

pyproject.toml

@@ -30,9 +30,9 @@ authors = [
 license = {file = 'LICENSE'}
 requires-python = '>=3.9'
 dependencies = [
-  "numpy>=1.22.4,<2; python_version<'3.11'",


Shouldn't we also unpin numpy in our ci/deps/ files?

mroeschke · 2023-12-27T20:13:02Z

pandas/core/dtypes/cast.py

+                right = left_dtype
+            elif (
+                not np.issubdtype(left_dtype, np.unsignedinteger)
+                and 0 < right <= 2 ** (8 * right_dtype.itemsize - 1) - 1


2 ** (8 * right_dtype.itemsize - 1) - 1 can be replaced with np.iinfo(right_dtype).max I think

lithomas1 · 2023-12-29T18:53:18Z

followup PR created @ #56682

* DEPS: Test NEP 50 * Use Python floats in test_maybe_promote_float_with_float * Refactor test_to_html_multiindex to allow tests to collect * Supress deprecationwarning for now * Use old invocation * Use Python ints in _range.py functions * Address test_constructor * Fix test_constructor_coercion_signed_to_unsigned * Fix test_constructor_coercion_signed_to_unsigned * Cast numpy scalars as python scalars before arith ops * add xfail reason to TestCoercionFloat32 * only set promotion state for numpy > 2.0 * order was backwards * Version promotion state call * fix timedelta tests * go for green * fix non npdev too? * fixes * adjust xfail condition * go for green * add tests * add negative numbers test * updates * fix accidental changes * more * simplify * linter --------- Co-authored-by: Thomas Li <[email protected]>

jorisvandenbossche · 2024-01-09T13:11:41Z

pyproject.toml

-  "numpy>=1.23.2,<2; python_version=='3.11'",
-  "numpy>=1.26.0,<2; python_version>='3.12'",
+  "numpy>=1.22.4; python_version<'3.11'",
+  "numpy>=1.23.2; python_version=='3.11'",
+  "numpy>=1.26.0; python_version>='3.12'",


This PR removed the numpy<2 pin again, which is good for the development branch (and nightly wheels), but shouldn't we still keep the pin for the release branch?

Unless we are planning to build the pandas 2.2.0 wheels with nightly numpy (which isn't the case right now), the upcoming pandas 2.2.0 release is still not ABI compatible with numpy 2.0

I think this is right by my understanding of packaging around the NumPy 2 release. Should keep the pin on until wheels can be built against NumPy 2.0.0rc1. Then can build against NumPy>=2.0.0rc1 and can have runtime dependency of NumPy 1.2x+.

Slightly off topic but do we know what prevents us from using build isolation with meson? That would seemingly make this issue easier to manage as well

This PR removed the numpy<2 pin again, which is good for the development branch (and nightly wheels), but shouldn't we still keep the pin for the release branch?

Unless we are planning to build the pandas 2.2.0 wheels with nightly numpy (which isn't the case right now), the upcoming pandas 2.2.0 release is still not ABI compatible with numpy 2.0

Yeah, I just misjudged the timing for the numpy 2.0 release.
(Originally I think their tracker issue said that the RC would happen in January, which we could then build against to have compat with both numpy 1/2).

I'll put a PR up to put back the pin.

Slightly off topic but do we know what prevents us from using build isolation with meson? That would seemingly make this issue easier to manage as well

Sorry, not quite following here.
IIUC, build isolation means building the package in a separate environment from the one where it's installed in.

We currently just disable it for editable builds (otherwise the auto-rebuilding wouldn't work), and also on regular CI (since there's no point in installing the build tools again when they're already installed in the conda environment).

I'm pretty sure cibuildwheel might use it, though.

since there's no point in installing the build tools again when they're already installed in the conda environment)

The benefit depends on if you consider the conda environment to be a build environment or a runtime environment. The advantage of an isolated build environment is you can build your project against any particular version like oldest-supported-numpy for binary compatability but choose a different version for the runtime.

I haven't closely tracked the particular case of NumPy 2.0 but it sounds like that could be useful based off of @bashtage comment #55739 (comment)

DEPS: Test NEP 50

d93ffd0

mroeschke added the Compat pandas objects compatability with Numpy or Python functions label Oct 27, 2023

Merge remote-tracking branch 'upstream/main' into compat/nep50

85fca37

mroeschke added 2 commits October 27, 2023 18:06

Use Python floats in test_maybe_promote_float_with_float

4ec19f5

Refactor test_to_html_multiindex to allow tests to collect

0dd702c

mroeschke added 2 commits October 30, 2023 10:25

Merge remote-tracking branch 'upstream/main' into compat/nep50

df26df7

Supress deprecationwarning for now

b8e3eaa

mroeschke and others added 10 commits October 30, 2023 12:01

Use old invocation

1ce0252

Use Python ints in _range.py functions

a039851

Address test_constructor

45d3d51

Fix test_constructor_coercion_signed_to_unsigned

7569ea0

Fix test_constructor_coercion_signed_to_unsigned

1f94724

Cast numpy scalars as python scalars before arith ops

6fb901c

add xfail reason to TestCoercionFloat32

707543c

Merge branch 'main' into compat/nep50

017d49d

only set promotion state for numpy > 2.0

560f42d

order was backwards

d5daa39

mroeschke added 2 commits November 7, 2023 15:56

Merge remote-tracking branch 'upstream/main' into compat/nep50

38eace5

Version promotion state call

9df7f0f

lithomas1 added 3 commits November 22, 2023 12:32

Merge branch 'main' into compat/nep50

5153123

fix timedelta tests

97ebfd6

Merge branch 'main' into compat/nep50

ee5b6b0

lithomas1 mentioned this pull request Dec 18, 2023

RLS: 2.2.0 #54902

Closed

lithomas1 added 2 commits December 19, 2023 20:14

add tests

f79bc66

add negative numbers test

d7ac452

lithomas1 requested a review from WillAyd December 20, 2023 04:22

jbrockmendel reviewed Dec 20, 2023

View reviewed changes

pandas/_libs/tslibs/timedeltas.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Dec 20, 2023

View reviewed changes

pandas/tests/series/test_constructors.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Dec 20, 2023

View reviewed changes

lithomas1 added 3 commits December 20, 2023 14:15

updates

0d80c1c

fix accidental changes

399bdf0

more

b50c6de

WillAyd reviewed Dec 21, 2023

View reviewed changes

lithomas1 requested a review from phofl December 21, 2023 03:26

lithomas1 added 2 commits December 21, 2023 11:10

simplify

79f4dde

linter

9ce828b

lithomas1 requested review from WillAyd and jbrockmendel December 21, 2023 19:41

lithomas1 linked an issue Dec 21, 2023 that may be closed by this pull request

BUG: pandas nightly wheel requires numpy<2 #56563

Closed

3 tasks

phofl reviewed Dec 21, 2023

View reviewed changes

WillAyd approved these changes Dec 21, 2023

View reviewed changes

lithomas1 merged commit e0e47e8 into pandas-dev:main Dec 22, 2023
45 checks passed

mroeschke commented Dec 27, 2023

View reviewed changes

mroeschke deleted the compat/nep50 branch December 29, 2023 19:50

jorisvandenbossche reviewed Jan 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPS: Test NEP 50 #55739

DEPS: Test NEP 50 #55739

mroeschke commented Oct 27, 2023 •

edited by lithomas1

Loading

mroeschke commented Oct 28, 2023

seberg commented Oct 28, 2023

mroeschke commented Oct 30, 2023

mroeschke commented Oct 30, 2023

lithomas1 commented Nov 6, 2023

seberg commented Nov 6, 2023

lithomas1 commented Nov 8, 2023 •

edited

Loading

WillAyd commented Nov 8, 2023

lithomas1 commented Dec 20, 2023

jbrockmendel Dec 20, 2023

lithomas1 Dec 20, 2023

WillAyd Dec 21, 2023

jbrockmendel Dec 28, 2023

WillAyd Dec 28, 2023

jbrockmendel Dec 20, 2023

jbrockmendel Dec 20, 2023

WillAyd Dec 21, 2023

lithomas1 Dec 21, 2023

WillAyd Dec 21, 2023

lithomas1 Dec 21, 2023

phofl left a comment

WillAyd left a comment

lithomas1 commented Dec 22, 2023

mroeschke Dec 27, 2023

mroeschke Dec 27, 2023

mroeschke Dec 27, 2023

lithomas1 commented Dec 29, 2023

jorisvandenbossche Jan 9, 2024

bashtage Jan 9, 2024

WillAyd Jan 9, 2024

lithomas1 Jan 10, 2024

lithomas1 Jan 10, 2024

WillAyd Jan 10, 2024

DEPS: Test NEP 50 #55739

DEPS: Test NEP 50 #55739

Conversation

mroeschke commented Oct 27, 2023 • edited by lithomas1 Loading

mroeschke commented Oct 28, 2023

seberg commented Oct 28, 2023

mroeschke commented Oct 30, 2023

mroeschke commented Oct 30, 2023

lithomas1 commented Nov 6, 2023

seberg commented Nov 6, 2023

lithomas1 commented Nov 8, 2023 • edited Loading

WillAyd commented Nov 8, 2023

lithomas1 commented Dec 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

lithomas1 commented Dec 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithomas1 commented Dec 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Oct 27, 2023 •

edited by lithomas1

Loading

lithomas1 commented Nov 8, 2023 •

edited

Loading