[RFC] Speed up bootstrap by maintaining a "delta set" table #145

ieQu1 · 2023-06-21T14:37:05Z

Currently replicants have to copy the entire contents of the tables when they reconnect, even after a short while. With large enough volume of data it may hinder cluster recovery after disaster or maintenance. Moreover, it makes it almost impossible to rebalance the load on the core nodes.

Initially we tried to solve this problem by persisting the transaction log, so the replicants recovering after a reconnect could replay it instead of going through the entire bootstrap procedure.

That approach proved to hurt performance too much to be practical. In addition, flapping client connections can often generate delete -> add -> delete -> ... loops in the transaction logs, making them larger than the table itself, making the whole idea of replaying transaction log questionable.

Below I describe an alternative approach. Instead of trying to avoid bootstrap, we could speed it up.

For each shard we create so called "delta set", that consists of N set-like tables (plain ets or plain rocksdb) that store the following records: {{Table, Key}, X} where X is a value of a counter.
Every M seconds we rotate the tables in the delta set in a ring buffer fashion, contents of the oldest table are dropped. We also increase counter X by 1.
mria_rlog_server process, as it processes intercepted transactions, writes each affected key to the table with the current delta set table. The existing keys are simply overwritten.
When a replicant connects to the core, it tells it its current logical timestamp in the hello message
The core checks if the timestamp is covered by its delta set.
If it is, instead of doing the normal bootstrap server loop (https://github.com/emqx/mria/blob/main/src/mria_bootstrapper.erl#L192) it loops over the delta set keys, and it doesn't send clear_table command (https://github.com/emqx/mria/blob/main/src/mria_bootstrapper.erl#L240), so the replicant preserves its local data.
If the replicant's data is too old, bootstrap server does the normal loop.

Pitfalls:

Clock skews. Possible solution: make sure to replay much older keys than the replicant's timestamp.
Care should be taken to avoid "skipping" over a part of delta set, as tables get rotated while the bootstrap is running. This could be prevented, perhaps, by making sure X counter doesn't change during bootstrap, and only increments by 1 while jumping to the next table.
When the core node itself restarts, or rlog_server process restarts, it can skip certain keys. In this situation replaying the delta set would lead to inconsistent results, so the best course of action would be to simply drop it. It means any replicant node that connects to a freshly restarted core node has to bootstrap from scratch.

The text was updated successfully, but these errors were encountered:

thalesmg · 2023-09-06T20:49:39Z

I assume one needed step is to periodically update the replicants. Maybe, each time the delta set tables rotate, the cores would broadcast the new logical timestamp to the replicants?

Also, I guess clock skews would come in play here in the sense that, although the timestamps would be just counters, each core node could diverge on which is the current timestamp. That is, each core might be at a different "time". So replicants might need to keep track of current timestamps per core rather than just per table.

This was referenced Jun 21, 2023

Persist transaction log #29

Closed

Core node graceful stop/handover/rebalancing #95

Open

ieQu1 added the Feature label Jun 29, 2023

ieQu1 mentioned this issue Aug 16, 2023

Connection instantly drops to 0 emqx/emqx#11187

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Speed up bootstrap by maintaining a "delta set" table #145

[RFC] Speed up bootstrap by maintaining a "delta set" table #145

ieQu1 commented Jun 21, 2023 •

edited

Loading

thalesmg commented Sep 6, 2023

[RFC] Speed up bootstrap by maintaining a "delta set" table #145

[RFC] Speed up bootstrap by maintaining a "delta set" table #145

Comments

ieQu1 commented Jun 21, 2023 • edited Loading

Pitfalls:

thalesmg commented Sep 6, 2023

ieQu1 commented Jun 21, 2023 •

edited

Loading