Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added periodic topology checks #51

Merged
merged 1 commit into from
Oct 4, 2023

Conversation

barshaul
Copy link

@barshaul barshaul commented Sep 27, 2023

This PR adds an optional periodic topology checks to the async cluster client.
The current periodic check use the user's connections. On the next PR I will add management connections that will be used for this checks.

<style> </style>
language clientCount data_size num_of_tasks Average of get_existing_std_dev Average of set_average_latency Average of tps TPS diff SET avg latency diff GET avg latency diff
mainline 1 100 1 0.087 0.230 4315      
mainline 1 100 10 0.166 0.392 25265      
mainline 1 100 100 0.587 1.057 94161      
mainline 1 100 1000 2.217 6.628 150777      
mainline 1 4000 1 0.035 0.265 4597      
mainline 1 4000 10 0.124 0.431 24015      
mainline 1 4000 100 0.164 0.912 110168      
mainline 1 4000 1000 1.008 7.443 134527      
periodic-60-sec 1 100 1 0.072 0.232 4318 1.001 1.006 0.831
periodic-60-sec 1 100 10 0.220 0.467 21419 0.848 1.193 1.323
periodic-60-sec 1 100 100 0.561 1.070 93152 0.989 1.012 0.956
periodic-60-sec 1 100 1000 2.201 6.594 151617 1.006 0.995 0.993
periodic-60-sec 1 4000 1 0.032 0.274 4390 0.955 1.033 0.934
periodic-60-sec 1 4000 10 0.123 0.430 24181 1.007 0.998 0.988
periodic-60-sec 1 4000 100 0.177 0.896 111774 1.015 0.983 1.079
periodic-60-sec 1 4000 1000 0.994 7.220 138587 1.030 0.970 0.987

@barshaul barshaul force-pushed the periodic_check branch 2 times, most recently from b5e0304 to 3b4bbdd Compare September 27, 2023 16:08
@barshaul barshaul marked this pull request as ready for review September 27, 2023 16:30
@barshaul barshaul force-pushed the periodic_check branch 2 times, most recently from f18ce76 to 8ab226b Compare September 27, 2023 16:53
@barshaul barshaul force-pushed the periodic_check branch 3 times, most recently from b4b07f9 to 5b27f1f Compare September 28, 2023 13:30
@barshaul
Copy link
Author

@nihohit ready for review

@barshaul barshaul force-pushed the periodic_check branch 9 times, most recently from 3aabf2d to e00e610 Compare October 1, 2023 14:00
@barshaul
Copy link
Author

barshaul commented Oct 1, 2023

@nihohit this time for real, ready for review:)

@nihohit
Copy link

nihohit commented Oct 2, 2023

@barshaul benchmarks?

Copy link

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

round

redis/src/cluster_topology.rs Outdated Show resolved Hide resolved
#[cfg(all(not(feature = "tokio-comp"), feature = "async-std-comp"))]
async_std::task::sleep(interval_duration).await;
let read_guard = inner.conn_lock.read().await;
if read_guard.get_current_topology_hash().is_none() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do that? sounds like this is exactly the situation where we want a topology refresh.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it cannot be None. When a new ClusterConnInner is being created, before we start the periodic checks we do refresh slots. If refresh slots failed it will fail the ClusterConnInner creation, otherwise it has to set the topology hash with Some(val).
Do you think it's better if i'll remove the Option and initialize it with 0?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, use 0.

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved
redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved
redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved
Ok(())
.await;
if topology_check_res.is_ok() && topology_check_res.unwrap() {
let _ = Self::refresh_slots_with_retries(inner.clone()).await;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why trigger a new process and querying more nodes, instead of using the new topology result?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new topology result is based on log2n nodes. We cannot know at this stage if their view is accurate - if this small number of nodes have the most common view across the cluster nodes.
I can optimize it that if it's a single node cluster we'll use this view, but i'm not sure if it's really necessary. WDYT

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO even using a smaller sample is safe enough here, but if you're ok with delaying the slot refresh by another network round trip, it's fine.

redis/tests/test_cluster_async.rs Outdated Show resolved Hide resolved
redis/tests/test_cluster_async.rs Outdated Show resolved Hide resolved
redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved
redis/tests/test_cluster_async.rs Outdated Show resolved Hide resolved
@barshaul barshaul force-pushed the periodic_check branch 3 times, most recently from 48308b1 to 2652921 Compare October 2, 2023 14:29
@barshaul
Copy link
Author

barshaul commented Oct 2, 2023

@nihohit round

Copy link

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsm

redis/src/cluster_async/mod.rs Outdated Show resolved Hide resolved
.await?;
Ok(())
.await;
if topology_check_res.is_ok() && topology_check_res.unwrap() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, if let Ok(true) = res {

@barshaul barshaul merged commit b6b2e9a into amazon-contributing:main Oct 4, 2023
10 checks passed
@barshaul barshaul deleted the periodic_check branch October 4, 2023 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants