Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Add binlog writer fallback encoding #34357

Closed
1 task done
shaoting-huang opened this issue Jul 2, 2024 · 1 comment
Closed
1 task done

[Enhancement]: Add binlog writer fallback encoding #34357

shaoting-huang opened this issue Jul 2, 2024 · 1 comment
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days

Comments

@shaoting-huang
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

Currently, Milvus binlog writer uses parquet plain dictionary as encoding by default. If the dictionary grows too large, it will fall back to plain encoding. We should provide fallback encoding to support other encoding methods if plain dictionary encoding fallbacks.

Why is this needed?

No response

Anything else?

No response

@shaoting-huang shaoting-huang added the kind/enhancement Issues or changes related to enhancement label Jul 2, 2024
sre-ci-robot pushed a commit that referenced this issue Jul 17, 2024
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](https://github.com/apache/arrow/blob/e65c1e295d82c7076df484089a63fa3ba2bd55d1/go/parquet/file/column_writer_types.gen.go.tmpl#L238)
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <[email protected]>
Copy link

stale bot commented Aug 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days
Projects
None yet
Development

No branches or pull requests

1 participant