Don't fragment table data on `lib.update` #2002

IvoDD · 2024-11-14T15:07:50Z

Is your feature request related to a problem? Please describe.
If we do a lot of updates on small date ranges we will eventually fragment the data a lot and it will become unreadable.

I think currently if we update a single row within a table data key we will end up writing 3 table data keys.

Describe the solution you'd like
In some cases fragmenting the data is unavoidable if we want to have reasonable performance on small updates. E.g. if we use update with a range after the existing one. So for such cases we can have lib.update(defragment=True) which will pay the extra price at update time but it won't fragment table data and keep read performance.

Also I think we can decrease the fragmentation without any extra cost in cases where we split up existing table data keys.
What we do now is:

1. Read all table data keys which interstect the updated date range
2. Filter out the first table data key to only contain index before updated date range and write it back
3. Filter out the last table data key to only contian index after updated date range and write it back
4. Write a completely new segment with the updated date range

When interstacting table data keys are <3 we end up increasing the number of total segments. We can instead without extra cost write the combined segment from steps 2,3 and 4 as one table data key (and maybe split it up if it's > 100k rows)

Describe alternatives you've considered
Occasional defragmentation with lib.write

The text was updated successfully, but these errors were encountered:

IvoDD added the enhancement New feature or request label Nov 14, 2024

maxim-morozov added replicated and removed replicated labels Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't fragment table data on `lib.update` #2002

Don't fragment table data on `lib.update` #2002

IvoDD commented Nov 14, 2024

Don't fragment table data on lib.update #2002

Don't fragment table data on lib.update #2002

Comments

IvoDD commented Nov 14, 2024

Don't fragment table data on `lib.update` #2002

Don't fragment table data on `lib.update` #2002