You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Milvus binlog writer uses parquet plain dictionary as encoding by default. If the dictionary grows too large, it will fall back to plain encoding. We should provide fallback encoding to support other encoding methods if plain dictionary encoding fallbacks.
Why is this needed?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
issue: #34357
Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](https://github.com/apache/arrow/blob/e65c1e295d82c7076df484089a63fa3ba2bd55d1/go/parquet/file/column_writer_types.gen.go.tmpl#L238)
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.
With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.
Signed-off-by: shaoting-huang <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.
Is there an existing issue for this?
What would you like to be added?
Currently, Milvus binlog writer uses parquet plain dictionary as encoding by default. If the dictionary grows too large, it will fall back to plain encoding. We should provide fallback encoding to support other encoding methods if plain dictionary encoding fallbacks.
Why is this needed?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: