-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495
BUG: DataFrame.to_json OverflowError with np.long* dtypes #55495
Conversation
We don't support long double as a data type and I don't think upcasting double to long double is appropriate. We should probably just raise here |
0b279ba
to
b972f99
Compare
@WillAyd, Thanks for identifying the issue with commit. I have updated the commit with changes suggested in above comment. |
b972f99
to
3e3c713
Compare
@@ -1610,6 +1610,11 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) { | |||
PyArray_DescrFromType(NPY_DOUBLE)); | |||
tc->type = JT_DOUBLE; | |||
return; | |||
} else if (PyArray_IsScalar(obj, LongDouble)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but I think the lines directly following it are supposed to be a catchall for unsupported types. Do you know why that isn't being hit? I would rather we keep this generic instead of having to specify an error message for every type that we don't serialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a code section at the end of this function:
pc->iterBegin = Dir_iterBegin;
pc->iterEnd = Dir_iterEnd;
pc->iterNext = Dir_iterNext;
pc->iterGetValue = Dir_iterGetValue;
pc->iterGetName = Dir_iterGetName;
return;
By default everything falls to this section and this causes infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but I'm asking about the next branch after what you've added.
} else if (PyArray_Check(obj) && PyArray_CheckScalar(obj)) {
Do you know what hits that currently? From reading the function I think the intent of that was to generically catch the issue you've described, but its possible the invariant is incorrect. Would something pass both PyArray_Check
and PyArray_CheckScalar
? Maybe the PyArray_Check
call is incorrect and removing that alone would fix your issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any value like: np.array(1) [0d array] evaluates true for both of them, so that check was intended for handling this case only. Also for any numpy scalar type, second will be true. So, simply removing pyArray_check handles both cases. Only thing we need to make sure, that all numpy scalar types are handled before this if block. I will add similar comment in code as well.
Thanks
0f7f840
to
7a29022
Compare
I'm curious, if pandas doesn't support these numpy types in a data frame, would it make sense to raise an exception at the time of creation or dtype assignment? There are maybe other places in the code these unsupported types would break? |
@nmacholl you might want to check existing issues for any discussion, but I don't think its quite that simple. For starters assignment/creation isn't the only way those types might make there way into a DataFrame. There also exists a subset of users who may not use pandas for I/O / computations (where a lot of those limitations are imposed) versus just using it as an easy way to get labeled access to NumPy arrays. As an example, you'll see quite a few use cases in the issue log where users contain NumPy unicode dtypes in DataFrames even though pandas has almost no support for those; in such cases a DataFrame is just a labeled indexer, and I don't think we want to break that type of workflow |
@gupta-paras this looks good to me aside from outstanding comments / code suggestions. @mroeschke any thoughts on your end? |
Thanks @gupta-paras |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.