You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This might look like a stupid bug report at first glance, but let me explain:
Assume a master service is reading data from MongoDB where the data is written into MongoDB by other services.
By design, one of the cool things of MongoDB ist that it can work schemaless (i know you can enforce a schema).
Several services write data to MongoDB: collection.insert_one({'data_to_test': 42})
and some master service reads this data: pymongoarrow.api.aggregate_arrow_all(collection, [], schema=pymongoarrow.api.Schema({'data_to_test': pyarrow.int32()}))
this works absolutely fine. And even if some service writes a sting (or ObjectID, or datetime, or ...) to this field: collection.insert_one({'data_to_test': 'a string'})
the master service just receives a 'null' and all is fine.
But then, one day the master service completely breaks because one service wrote an Int64 to this field: temp_collection._collection.insert_one({'data_to_test': 1_000_000_000_000})
now the master service does not get a null. pymongoarrow.api.aggregate_arrow_all raises with OverflowError: value too large to convert to int32_t.
I have now written me acceptance tests for all possible combinations of data in MonoDB and reading them with any schema, e.g. there is an int in the database, and you read with a schema that says: string. All combinations work fine (setting the value to null on type mismatch is fine). The only combination just breaks everything is:
I consider this a "bug", because I cannot read any Int32 data if there is a single Int64 in the database. My solution now is to always read as Int64 and then downcast, but is this really how it should be?
Ps.: Its not a showstopper once you know that reading as Int32 is a nogo if the schema is not enforced, but its kind of surprising that the reading raises as all other combinations work fine.
The text was updated successfully, but these errors were encountered:
Hi,
This might look like a stupid bug report at first glance, but let me explain:
Assume a master service is reading data from MongoDB where the data is written into MongoDB by other services.
By design, one of the cool things of MongoDB ist that it can work schemaless (i know you can enforce a schema).
Several services write data to MongoDB:
collection.insert_one({'data_to_test': 42})
and some master service reads this data:
pymongoarrow.api.aggregate_arrow_all(collection, [], schema=pymongoarrow.api.Schema({'data_to_test': pyarrow.int32()}))
this works absolutely fine. And even if some service writes a sting (or ObjectID, or datetime, or ...) to this field:
collection.insert_one({'data_to_test': 'a string'})
the master service just receives a 'null' and all is fine.
But then, one day the master service completely breaks because one service wrote an Int64 to this field:
temp_collection._collection.insert_one({'data_to_test': 1_000_000_000_000})
now the master service does not get a
null
. pymongoarrow.api.aggregate_arrow_all raises withOverflowError: value too large to convert to int32_t
.I have now written me acceptance tests for all possible combinations of data in MonoDB and reading them with any schema, e.g. there is an int in the database, and you read with a schema that says: string. All combinations work fine (setting the value to
null
on type mismatch is fine). The only combination just breaks everything is:I consider this a "bug", because I cannot read any Int32 data if there is a single Int64 in the database. My solution now is to always read as Int64 and then downcast, but is this really how it should be?
Ps.: Its not a showstopper once you know that reading as Int32 is a nogo if the schema is not enforced, but its kind of surprising that the reading raises as all other combinations work fine.
The text was updated successfully, but these errors were encountered: