Implement round robin read from replica strategy for async cluster. #24

nihohit · 2023-08-15T07:03:31Z

based over https://github.com/nihohit/redis-rs/pull/22 and https://github.com/nihohit/redis-rs/pull/23 .

nihohit · 2023-08-28T19:42:46Z

barshaul · 2023-08-29T14:26:23Z

redis/src/cluster_topology.rs

+        return slot.addrs.primary.as_str();
+    }
+    match read_from_replica {
+        ReadFromReplicaStrategy::AlwaysFromPrimary => slot.addrs.primary.as_str(),


When can we get to a scenario where slot_addr == Replica, but read_from_replica == AlwaysFromPrimary?

If I understood it right - it can be if the user did a specific command routing to a replica node. I agree that if AlwaysFromPrimary is the readfrom strategy then we can't execute read commands on replicas (as we haven't ran READONLY on the existing connections), but what if the user did some management command?
for example, client.info(route=KeySlot(Replica, 1))

maybe this should be documented. I agree it's not intuitive.
slot_addr is where the command can go, read_from_replica represents whether the command can be sent to a replica. So, some commands can go to replicas, but only if read_from_replica is true.

can't slot_addr be the address that is passed by command routing (as I wrote in the example above)?
We should allow sending non key based commands to replica if the user routes it this way

how would you differentiate between a command that was routed by the user to a replica, and a command that was routed by its parsing to a replica?
Maybe we need 3 cases for SlotAddr: Primary, ForceReplica, ReplicaIfReadFromReplicaIsEnabled ?

I think that Route shouldn't be used for read from replica routing, but only for user routing / pre-fixed routing (e.g. allNodes). So get_address_from_slot would get the routing info and read_from_replica strategy, and if routing isn't none - it will use the routing. Otherwise, it will need to decide based on the read_from_replica strategy if we should route it to a primary or to a replica.
Is it feasible?

I'm sorry. I don't understand your suggesion. Can you write the signature you want get_address_from_slot to have?
and if Route is only for user routing, where/how will we get the slot for keyed commands, and whether they're readonly commands?

barshaul · 2023-08-29T15:15:55Z

redis/src/cluster_topology.rs

+    match read_from_replica {
+        ReadFromReplicaStrategy::AlwaysFromPrimary => slot.addrs.primary.as_str(),
+        ReadFromReplicaStrategy::RoundRobin => {
+            let index = slot


if all replicas failed, but still in the replicas addresses, do we have a fallback to the primary?

Good question.

if slot_addr == SlotAddr::Master || slot.addrs.replicas.is_empty() { return slot.addrs.primary.as_str(); }

ensures that if the SlotMap doesn't know about the replica, then the command will go to a primary. But if the ConnectionMap has removed a replica, the slot map isn't updated about this. This will cause the SlotMap to return a connection that the connection map doesn't know about, which will cause the cluster connection to send the request to a random node. problem.

Sounds like we can't keep the separation between connection and slot map.

Another scenario is if the replicas are present in the CLUSTER SLOTS output, but they aren't responsive (e.g. due to load or still wasn't recognized as failed in the cluster bus). Or, if the replicas have been filtered out from the cluster bus, but still in our connection map/slot map since we haven't done slots refreshment yet. So in both options we will face connection errors / timeouts.

So, basically- what i'm asking is how the retries mechanism of a read from replica command is implemented?
How do we handle a timeout/connection error when querying a replica? do we need to wait and retry to connect to this replica, or do we route it to the next replica? When do we decide to send it to the primary - only if no replicas are present?

I think we should try querying the next replica immediately. We should try to reconnect on the background, and/or start a topology refresh after several of connection errors, but as we have more nodes options to send this command to, I don't think we should delay the command execution.

I think that those decisions are about removing a connection from the connections manager. we can do that, but its beyond the scope of this PR.

It's not only about removing the connection from the connection manager, I think we need to define the retry mechanism for read from replicas. If we're basing on the current retry logic, then (correct me if i'm wrong) on connection error redis-rs blocks this command on trying to reconnect to this node and then retrying to send the command to the same node. So, even if we'll remove this replica connection from the connection manager after the connection fails, only the next command will be sent to a different replica. I think that a better solution would be to trigger a reconnect task on the background and immediately retry the command on the next replica / primary. But I agree it can be done in a separate PR,

connection error redis-rs blocks this command on trying to reconnect to this node and then retrying to send the command to the same node

AFAIS on every retry the cluster connection requests a new connection from the container, so if there are multiple replicas, the request will be rotated. The only case where the connection is specified is when using CommandRouting::Connection, which is only created on multi-node requests. When using CommandRouting::Route the route is resolved on each retry.

redis/src/cluster_routing.rs

nihohit force-pushed the round-robin branch from 916cbc1 to ac95a9d Compare August 15, 2023 07:04

nihohit requested a review from barshaul August 15, 2023 07:05

nihohit force-pushed the round-robin branch 2 times, most recently from 8c9bde8 to 4307936 Compare August 16, 2023 18:31

shachlanAmazon mentioned this pull request Aug 28, 2023

Read from replica in cluster valkey-io/valkey-glide#228

Closed

1 task

barshaul reviewed Aug 29, 2023

View reviewed changes

nihohit force-pushed the round-robin branch 3 times, most recently from 7092d2a to b390580 Compare September 21, 2023 10:55

barshaul approved these changes Sep 21, 2023

View reviewed changes

redis/src/cluster_routing.rs Outdated Show resolved Hide resolved

redis/src/cluster_routing.rs Outdated Show resolved Hide resolved

nihohit added 3 commits September 21, 2023 17:32

Remove allow_replica from for_routable.

91f8a0b

Use round-robin read from replica in clusters.

89846b1

Allow user to force send to replica.

350dfc5

nihohit force-pushed the round-robin branch from b390580 to 350dfc5 Compare September 21, 2023 17:33

nihohit merged commit ca6b9cb into main Sep 21, 2023
10 checks passed

nihohit deleted the round-robin branch September 21, 2023 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement round robin read from replica strategy for async cluster. #24

Implement round robin read from replica strategy for async cluster. #24

nihohit commented Aug 15, 2023 •

edited

Loading

nihohit commented Aug 28, 2023

barshaul Aug 29, 2023

barshaul Aug 29, 2023

nihohit Aug 31, 2023

barshaul Sep 3, 2023

nihohit Sep 14, 2023

barshaul Sep 18, 2023 •

edited

Loading

nihohit Sep 18, 2023

barshaul Aug 29, 2023

nihohit Aug 31, 2023

barshaul Sep 3, 2023 •

edited

Loading

nihohit Sep 14, 2023

barshaul Sep 18, 2023

nihohit Sep 18, 2023

Implement round robin read from replica strategy for async cluster. #24

Implement round robin read from replica strategy for async cluster. #24

Conversation

nihohit commented Aug 15, 2023 • edited Loading

nihohit commented Aug 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barshaul Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barshaul Sep 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nihohit commented Aug 15, 2023 •

edited

Loading

barshaul Sep 18, 2023 •

edited

Loading

barshaul Sep 3, 2023 •

edited

Loading