-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I can't append on a table indexed with float columns. It is not possible to define columnsStats for float columns #515
Comments
This issue is related to how we manage Revision Changes when appending data. For the first operation (overwrite), we use the code in qbeast-spark/core/src/main/scala/io/qbeast/spark/index/SparkRevisionFactory.scala Lines 57 to 112 in 66b8d79
Overwrite doesn't throw an error because it checks if there's any column reference in the For the second case, we use this code in IndexedTable: qbeast-spark/src/main/scala/io/qbeast/table/IndexedTable.scala Lines 271 to 304 in 66b8d79
Which does not make the same check and directly looks for f_min and f_max. Thus aren't found because of bad parsing of JSON schema. So, two errors here:
For the second problem, I am trying a solution on #522 by adding a schema to the ColumnStats, built from Transformers and the data Schema. |
What went wrong?
I want to index a table on a few float columns because the float transformer expects a float as input. However, in JSON, floats and doubles are encoded in the same way and mapped to Doubles when parsed, causing a ClassCastException.
How to reproduce?
It is impossible to define the column's min-max as float instead of Double, as JSON does not distinguish between the two types. So if I do:
I get this:
If I try to force the number to be float (at least for Scala) by adding the f at the end of the file (0.0f), I get another error, as the JSON is malformed.
2. Branch and commit id:
main ea4bcd8
and also cugni:next-in-line-rebased.
3. Spark version:
spark 3.5.3, on the main version (commit ea4bcd8)
5. How are you running Spark?
both distributed and local terminal
The text was updated successfully, but these errors were encountered: