Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better node chooser #23

Merged
merged 2 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions packages/cosmos/src/client/node_chooser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,27 @@ impl NodeChooser {
}

pub(super) fn choose_node(&self) -> Result<&Node, ConnectionError> {
let primary_health = self.primary.is_healthy(self.allowed_error_count);
// First we try to find a node that has had no issues at all.
// If that fails, then we go for the allowed error count.
// Motivation: we previously had issues where we would retry the primary
// node multiple times, exhausting our retry counts, and never fall
// back to another node.
self.choose_node_with_allowed(0)
.or_else(|_| self.choose_node_with_allowed(self.allowed_error_count))
}

fn choose_node_with_allowed(
&self,
allowed_error_count: usize,
) -> Result<&Node, ConnectionError> {
let primary_health = self.primary.is_healthy(allowed_error_count);
if primary_health.is_healthy() {
Ok(&self.primary)
} else {
let fallbacks = self
.fallbacks
.iter()
.filter(|node| node.is_healthy(self.allowed_error_count).is_healthy())
.filter(|node| node.is_healthy(allowed_error_count).is_healthy())
.collect::<Vec<_>>();

let mut rng = rand::thread_rng();
Expand Down
14 changes: 13 additions & 1 deletion packages/cosmos/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,8 @@ pub enum QueryErrorDetails {
RateLimited { source: tonic::Status },
#[error("The gRPC server is returning a 'forbidden' response: {source:?}")]
Forbidden { source: tonic::Status },
#[error("Server returned response that does not look like valid gRPC: {source:?}")]
NotGrpc { source: tonic::Status },
}

/// Different known Cosmos SDK error codes
Expand Down Expand Up @@ -540,6 +542,7 @@ impl QueryErrorDetails {
QueryErrorDetails::AccountSequenceMismatch { .. } => ConnectionIsFine,
QueryErrorDetails::RateLimited { .. } => NetworkIssue,
QueryErrorDetails::Forbidden { .. } => NetworkIssue,
QueryErrorDetails::NotGrpc { .. } => NetworkIssue,
}
}

Expand Down Expand Up @@ -605,6 +608,14 @@ impl QueryErrorDetails {
};
}

if err.message().contains("status: 405") {
return QueryErrorDetails::NotGrpc { source: err };
}

if err.message().contains("invalid compression flag") {
return QueryErrorDetails::NotGrpc { source: err };
}

QueryErrorDetails::Unknown(err)
}

Expand All @@ -623,7 +634,8 @@ impl QueryErrorDetails {
| QueryErrorDetails::TransportError { .. }
| QueryErrorDetails::BlocksLagDetected { .. }
| QueryErrorDetails::NoNewBlockFound { .. }
| QueryErrorDetails::AccountSequenceMismatch(_) => false,
| QueryErrorDetails::AccountSequenceMismatch(_)
| QueryErrorDetails::NotGrpc { .. } => false,
QueryErrorDetails::RateLimited { .. } | QueryErrorDetails::Forbidden { .. } => true,
}
}
Expand Down
Loading