Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fragment table data on lib.update #2002

Open
IvoDD opened this issue Nov 14, 2024 · 0 comments
Open

Don't fragment table data on lib.update #2002

IvoDD opened this issue Nov 14, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@IvoDD
Copy link
Collaborator

IvoDD commented Nov 14, 2024

Is your feature request related to a problem? Please describe.
If we do a lot of updates on small date ranges we will eventually fragment the data a lot and it will become unreadable.

I think currently if we update a single row within a table data key we will end up writing 3 table data keys.

Describe the solution you'd like
In some cases fragmenting the data is unavoidable if we want to have reasonable performance on small updates. E.g. if we use update with a range after the existing one. So for such cases we can have lib.update(defragment=True) which will pay the extra price at update time but it won't fragment table data and keep read performance.

Also I think we can decrease the fragmentation without any extra cost in cases where we split up existing table data keys.
What we do now is:

1. Read all table data keys which interstect the updated date range
2. Filter out the first table data key to only contain index before updated date range and write it back
3. Filter out the last table data key to only contian index after updated date range and write it back
4. Write a completely new segment with the updated date range

When interstacting table data keys are <3 we end up increasing the number of total segments. We can instead without extra cost write the combined segment from steps 2,3 and 4 as one table data key (and maybe split it up if it's > 100k rows)

Describe alternatives you've considered
Occasional defragmentation with lib.write

@IvoDD IvoDD added the enhancement New feature or request label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants