BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495

gupta-paras · 2023-10-12T13:02:00Z

closes BUG: DataFrame.to_json OverflowError with np.long* dtypes #55403 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

WillAyd · 2023-10-12T13:57:34Z

We don't support long double as a data type and I don't think upcasting double to long double is appropriate. We should probably just raise here

gupta-paras · 2023-10-12T14:26:46Z

@WillAyd, Thanks for identifying the issue with commit. I have updated the commit with changes suggested in above comment.
Thanks

WillAyd · 2023-10-13T13:14:33Z

pandas/_libs/src/vendored/ujson/python/objToJSON.c

@@ -1610,6 +1610,11 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
                                  PyArray_DescrFromType(NPY_DOUBLE));
        tc->type = JT_DOUBLE;
        return;
+    } else if (PyArray_IsScalar(obj, LongDouble)) {


This works, but I think the lines directly following it are supposed to be a catchall for unsupported types. Do you know why that isn't being hit? I would rather we keep this generic instead of having to specify an error message for every type that we don't serialize

There is a code section at the end of this function:

pc->iterBegin = Dir_iterBegin; pc->iterEnd = Dir_iterEnd; pc->iterNext = Dir_iterNext; pc->iterGetValue = Dir_iterGetValue; pc->iterGetName = Dir_iterGetName; return;

By default everything falls to this section and this causes infinite loop.

Right but I'm asking about the next branch after what you've added.

} else if (PyArray_Check(obj) && PyArray_CheckScalar(obj)) {

Do you know what hits that currently? From reading the function I think the intent of that was to generically catch the issue you've described, but its possible the invariant is incorrect. Would something pass both PyArray_Check and PyArray_CheckScalar? Maybe the PyArray_Check call is incorrect and removing that alone would fix your issue?

Any value like: np.array(1) [0d array] evaluates true for both of them, so that check was intended for handling this case only. Also for any numpy scalar type, second will be true. So, simply removing pyArray_check handles both cases. Only thing we need to make sure, that all numpy scalar types are handled before this if block. I will add similar comment in code as well.
Thanks

doc/source/whatsnew/v2.2.0.rst

pandas/_libs/src/vendored/ujson/python/objToJSON.c

nmacholl · 2023-10-13T19:47:24Z

We don't support long double as a data type and I don't think upcasting double to long double is appropriate. We should probably just raise here

I'm curious, if pandas doesn't support these numpy types in a data frame, would it make sense to raise an exception at the time of creation or dtype assignment? There are maybe other places in the code these unsupported types would break?

WillAyd · 2023-10-13T20:46:34Z

@nmacholl you might want to check existing issues for any discussion, but I don't think its quite that simple. For starters assignment/creation isn't the only way those types might make there way into a DataFrame. There also exists a subset of users who may not use pandas for I/O / computations (where a lot of those limitations are imposed) versus just using it as an easy way to get labeled access to NumPy arrays. As an example, you'll see quite a few use cases in the issue log where users contain NumPy unicode dtypes in DataFrames even though pandas has almost no support for those; in such cases a DataFrame is just a labeled indexer, and I don't think we want to break that type of workflow

WillAyd · 2023-10-13T20:47:25Z

@gupta-paras this looks good to me aside from outstanding comments / code suggestions. @mroeschke any thoughts on your end?

mroeschke · 2023-10-16T17:53:53Z

Thanks @gupta-paras

gupta-paras requested a review from WillAyd as a code owner October 12, 2023 13:02

gupta-paras force-pushed the long_double_json_overflow branch from 0b279ba to b972f99 Compare October 12, 2023 14:24

BUG: DataFrame.to_json OverflowError with np.long* dtypes

3e3c713

gupta-paras force-pushed the long_double_json_overflow branch from b972f99 to 3e3c713 Compare October 12, 2023 15:30

WillAyd reviewed Oct 13, 2023

View reviewed changes

mroeschke added the IO JSON read_json, to_json, json_normalize label Oct 13, 2023

BUG: DataFrame.to_json OverflowError with np.long* dtypes #Comment-1

7a29022

gupta-paras force-pushed the long_double_json_overflow branch from 0f7f840 to 7a29022 Compare October 13, 2023 16:45

WillAyd reviewed Oct 13, 2023

View reviewed changes

doc/source/whatsnew/v2.2.0.rst Outdated Show resolved Hide resolved

WillAyd reviewed Oct 13, 2023

View reviewed changes

pandas/_libs/src/vendored/ujson/python/objToJSON.c Outdated Show resolved Hide resolved

BUG: DataFrame.to_json OverflowError with np.long* dtypes #Comment-2

774519f

WillAyd approved these changes Oct 14, 2023

View reviewed changes

mroeschke approved these changes Oct 16, 2023

View reviewed changes

mroeschke added this to the 2.2 milestone Oct 16, 2023

mroeschke merged commit 32c9c8f into pandas-dev:main Oct 16, 2023
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495

BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495

gupta-paras commented Oct 12, 2023

WillAyd commented Oct 12, 2023

gupta-paras commented Oct 12, 2023 •

edited

Loading

WillAyd Oct 13, 2023

gupta-paras Oct 13, 2023

WillAyd Oct 13, 2023

gupta-paras Oct 13, 2023 •

edited

Loading

nmacholl commented Oct 13, 2023

WillAyd commented Oct 13, 2023 •

edited

Loading

WillAyd commented Oct 13, 2023

mroeschke commented Oct 16, 2023

BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495

BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495

Conversation

gupta-paras commented Oct 12, 2023

WillAyd commented Oct 12, 2023

gupta-paras commented Oct 12, 2023 • edited Loading

WillAyd Oct 13, 2023

Choose a reason for hiding this comment

gupta-paras Oct 13, 2023

Choose a reason for hiding this comment

WillAyd Oct 13, 2023

Choose a reason for hiding this comment

gupta-paras Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

nmacholl commented Oct 13, 2023

WillAyd commented Oct 13, 2023 • edited Loading

WillAyd commented Oct 13, 2023

mroeschke commented Oct 16, 2023

gupta-paras commented Oct 12, 2023 •

edited

Loading

gupta-paras Oct 13, 2023 •

edited

Loading

WillAyd commented Oct 13, 2023 •

edited

Loading