Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Keeper Reliability #19

Merged
merged 13 commits into from
Feb 7, 2024
Merged

Improve Keeper Reliability #19

merged 13 commits into from
Feb 7, 2024

Conversation

ebatsell
Copy link
Collaborator

@ebatsell ebatsell commented Jan 25, 2024

Improves keeper reliability for landing transactions by:

  • no longer skipping transactions that initially fail with a BlockhashNotFound error
  • Merging vote accounts with all validator history accounts for epoch credit cranking, so offline validators that no longer showing up in getVoteAccounts still get cranked (necessary for Steward program scoring)

The first part has already been running on mainnet for a week, and for the last 3 epochs, all vote accounts have been updated each epoch (as measured by: same number of commissions tracked as stakes).

@ebatsell ebatsell requested a review from buffalu January 26, 2024 20:21
@ebatsell ebatsell force-pushed the improve-keeper-reliability branch 2 times, most recently from a31ef10 to b773fdb Compare February 7, 2024 01:42
@ebatsell ebatsell force-pushed the improve-keeper-reliability branch from 65d8d4b to b6822c1 Compare February 7, 2024 02:10
RetryError(Vec<Instruction>),
#[error("Transactions failed to execute after multiple retries")]
TransactionRetryError(Vec<Vec<Instruction>>),
TransactionRetryError(Vec<(Vec<Instruction>, String)>),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransactionError instead of string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's other transaction types baked into the ClientError like IoError, ReqwestError, SigningError, etc that I feel is useful to propogate. and the only way to bubble it up is to cast to string since the ClientError type doesn't implement the Clone trait we need. If we saved only the transaction errors you'd end up with Vec<Result<(), Option<TransactionError>>>. Can switch that out if it's preferred but we'd be losing information with that

.flat_map(|i| instruction_batches[i])
.cloned()
.collect::<Vec<_>>();
// Re-sign transactions with fresh blockhash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to send duplicates if always resigning. should just need to resign once the blockhash you previously signed txs with expires based on processed commit level. there's a is_blockhash_valid_with_config method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added logic to do less re-signing

&mut executed_signatures,
)
.await?;
while retries < retry_count {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo max retries should prolly be incremented on blockhash increment? or related to some time.

retry_count of 3 may be fine for 100 txs, but won't be fine for 2k txs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a retry count of 10, maybe you were looking at a older commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added logic to fetch blockhash and increment retries only when it's expired

#[error("RPC Client error: {0:?}")]
TransactionClientError(String, Vec<Vec<Instruction>>),
TransactionClientError(String, Vec<Result<(), SendTransactionError>>),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda messy return type

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does vec correspond to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error is for the case where we're submitting transactions and one of the calls to get blockhash fails, and we may have partially-executed transactions in the list - the vec corresponds to the transactions passed into parallel_execute_transactions

executed_signatures: &mut HashMap<Signature, usize>,
) {
// Confirmes TXs in batches of 256 (max allowed by RPC method)
executed_signatures: HashMap<Signature, usize>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could pass in hashset here then return hashset of confirmed and worry about the index in caller. but this is fine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ebatsell ebatsell merged commit 3d34220 into master Feb 7, 2024
2 checks passed
@ebatsell ebatsell deleted the improve-keeper-reliability branch February 7, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants