Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidFeePayerFilter #32939

Closed
wants to merge 21 commits into from

Conversation

apfitzge
Copy link
Contributor

@apfitzge apfitzge commented Aug 22, 2023

Problem

Some transactions that cannot pay fees make it into banking stage, often with repeat offenders. These transactions take time to process which can cause fewer valid transactions to make it into the block.

Summary of Changes

  • Add an InvalidFeePayerFilter
    • Wrapper of DashMap for now, but potentially switch to a bloom filter in future (?)
  • Upon seeing a transaction without sufficient funds for fees, add the fee-payer to the InvalidFeePayerFilter
  • Multi-Iterator checks InvalidFeePayerFilter during batch selection
    • this let's more quickly rule these out
  • Checks InvalidFeePayerFilter before forwarding
    • no need to forward these transactions to the next leader
  • BankingStage checks InvalidFeePayerFilter when receiving packets from SigVerify
    • this means these transactions cannot take up the limited buffer space
  • BankingStage will periodically clear the filter

Fixes #

@apfitzge apfitzge force-pushed the invalid_fee_payer_filter branch from 4dfebf1 to d106c68 Compare August 23, 2023 20:42
@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Merging #32939 (afe8dd8) into master (5f58d2d) will increase coverage by 0.0%.
Report is 2 commits behind head on master.
The diff coverage is 93.6%.

@@           Coverage Diff            @@
##           master   #32939    +/-   ##
========================================
  Coverage    82.0%    82.0%            
========================================
  Files         796      797     +1     
  Lines      215727   215926   +199     
========================================
+ Hits       176897   177100   +203     
+ Misses      38830    38826     -4     

@apfitzge
Copy link
Contributor Author

I wrote a benchmark to test this code against master. The bench is a bit hacky, but I filled a single-thread of banking-stage buffer half with prioritized spam (unable to pay fees) and half with valid transactions. Benched how long it took to get through all transactions.

After observing the first spam transaction, the rest should be more quickly filtered out by the MI batch selection:

master:
test bench_banking_invalid_fee_payers               ... bench:   2,998,819 ns/iter (+/- 64,277)
test bench_banking_invalid_fee_payers               ... bench:   3,036,917 ns/iter (+/- 52,711)
test bench_banking_invalid_fee_payers               ... bench:   2,987,155 ns/iter (+/- 58,034)

invalid_fee_payer_filter:
test bench_banking_invalid_fee_payers               ... bench:   2,069,501 ns/iter (+/- 75,636)
test bench_banking_invalid_fee_payers               ... bench:   2,034,970 ns/iter (+/- 32,393)
test bench_banking_invalid_fee_payers               ... bench:   2,026,578 ns/iter (+/- 25,654)

bench is a bit flaky, with a sigsev if I try to join my poh_service...so not committing the code for bench quite yet.

TransactionError::AccountNotFound
| TransactionError::InsufficientFundsForFee
| TransactionError::InvalidAccountForFee => {
self.invalid_fee_payer_filter.add(*tx.message().fee_payer());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the only place we add to our filter currently.

These 3 errors:

  • AccountNotFound - could not find fee-payer during load
  • InsufficientFundsForFee - fee-payer exists, but doesn't have enough lamports
  • InvalidAccountForFee - this account can't pay fees

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't AccountNotFound get thrown for accounts besides the fee-payer?

:my_eyes_are_bleeding:

if !validated_fee_payer {
error_counters.account_not_found += 1;
return Err(TransactionError::AccountNotFound);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the only instance😭
Maybe it's better to swap that error with InsufficientFunds or rename it to FeePayerAccoutNotFound so we don't lose the distinction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the only instance😭 Maybe it's better to swap that error with InsufficientFunds or rename it to FeePayerAccoutNotFound so we don't lose the distinction

could. separate pr tho

@@ -285,6 +287,12 @@ impl LatestUnprocessedVotes {
bank.as_ref(),
)
{
if invalid_fee_payer_filter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check vote transactions for invalid fee-payers prior to forwarding.

let original_length = deserialized_packets.len();
let filtered_packets = deserialized_packets
.into_iter()
.filter(|packet| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove any known invalid-fee-payer when receiving from sigverify into banking stage.

This prevents them from taking up buffer space

let message = &packet.transaction().get_message().message;
if invalid_fee_payer_filter.should_reject(&message.static_account_keys()[0]) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for known invalid fee-payers during MI batch selection. Don't even attempt to load and execute these transactions.

@apfitzge apfitzge marked this pull request as ready for review August 24, 2023 17:58
@apfitzge apfitzge requested review from tao-stones and buffalu August 24, 2023 17:58
@apfitzge
Copy link
Contributor Author

Plan is to eventually also check txs prior to signature verification in SigVerifyStage so we don't even waste resources verifying them. That's a bit more involved, and think it's a good for a separate PR.

}

/// Simple wrapper thread that periodically resets the filter.
pub struct InvalidFeePayerFilterThread {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm re-thinking after I posted the PR. I probably don't really need a separate thread for this at all.

  1. Just add an atomic interval on the InvalidFeePayerFilter
  2. Banking threads just call some fn that resets & reports if atomic interval should report

Copy link
Contributor

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, filtering txs couldn't/won't pay at banking_stage's receiving, processing and forwarding points are thoughtful process to improve banking qos. Even it'd be defenseless if spammer continuously rotating invalid payer accounts, this change, imo, are still good.

Will take another round of review later


#[derive(Default)]
pub struct InvalidFeePayerFilter {
recent_invalid_fee_payers: DashSet<Pubkey>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's resource exhaustion attack potential with retaining the pubkeys here. consider Deduper as a high-performance, fixed-size, statistical accounting mechanism

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had that in mind as a follow up, hence why I separated a class.
I can just do in this PR tho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this a bit and could be wrong. It looks to me reseting or clearing the deduper/bloom filter would require mutability; so we'd need some sort of locking mechanism. Since this filter is shared by multiple threads, that would block other threads which is something I'm trying to avoid.

Might be better to just vacate LRU if we would exceed some capacity. That would protect us from OOMing, but potentially slow things down if we exceed capacity. vs periodic locking of something like deduper. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't deduper all atomics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for resetting:

    pub fn maybe_reset<R: Rng>(
        &mut self,
        ...

because the underlying bits is stored in a Vec: bits: Vec<AtomicU64>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson are you content with this decision to just randomly evict if we see too many invalid fee payers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance in the bench i was running (1 very active invalid fee-payer) showed no difference between bloom and dash map internally.

seems like bloom should be preferable in that case due to constant size?

@t-nelson are you content with this decision to just randomly evict if we see too many invalid fee payers?

seems fine, yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like bloom should be preferable in that case due to constant size?

thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a pre-allocated dashmap also has a constant size, although I think larger than the double-bloom filter for a similar number of items.

Since there doesn't seem to be any time performance benefit, I was sticking with the simpler implementation, instead of maintaining 2 bloom-filters to allow us to "atomically" clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson not sure if you saw the reply to your "thoughts?"

.iter()
.map(|tx| {
if invalid_fee_payer_filter.should_reject(&tx.message().account_keys()[0]) {
Err(TransactionError::InsufficientFundsForFee)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidAccountForFee would have less potential to mislead as an aggregate error here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to suggest to use a new TransactionError specifically to indicate it's failed because the payer account is black listed.

Mainly thinking from UX perspective; sophisticated abusers will soon find way to get around this filter, the 2 sec blackout would probably apply more to innocent mistakes, a clear error might help in that case?

Copy link
Contributor Author

@apfitzge apfitzge Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrecorded transaction errors do not get reported back to the users though. It'd purely be an internal metric - similar to our AccountInUse error metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson the actual errors here don't get used at all. We only check is_ok() below.
But I changed agree InvalidAccountForFee is a better dummy error here since we've effectively marked the account as invalid with the filter: d52a238

TransactionError::AccountNotFound
| TransactionError::InsufficientFundsForFee
| TransactionError::InvalidAccountForFee => {
self.invalid_fee_payer_filter.add(*tx.message().fee_payer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't AccountNotFound get thrown for accounts besides the fee-payer?

:my_eyes_are_bleeding:

if !validated_fee_payer {
error_counters.account_not_found += 1;
return Err(TransactionError::AccountNotFound);
}

@apfitzge apfitzge force-pushed the invalid_fee_payer_filter branch 3 times, most recently from c3a814a to 99ef4fd Compare September 5, 2023 20:46
core/src/invalid_fee_payer_filter.rs Show resolved Hide resolved

#[derive(Default)]
pub struct InvalidFeePayerFilter {
recent_invalid_fee_payers: DashSet<Pubkey>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance in the bench i was running (1 very active invalid fee-payer) showed no difference between bloom and dash map internally.

seems like bloom should be preferable in that case due to constant size?

@t-nelson are you content with this decision to just randomly evict if we see too many invalid fee payers?

seems fine, yeah

core/src/invalid_fee_payer_filter.rs Outdated Show resolved Hide resolved
.iter()
.map(|tx| {
if invalid_fee_payer_filter.should_reject(&tx.message().account_keys()[0]) {
// Actual error variant is not used - Result is only used to skip additional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core eng will use the error variant for debugging. conflating this case with others may mislead those efforts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kind of why I added the comment here. It makes it clear that these will not show up in the regular error metrics.
The only way someone finds this is if they search for the error variant, in which case they then see this comment.

I changed it to the more appropriate variant InvalidAccountForFee - unless we want to create a new error variant not sure what action to take here.

core/src/banking_stage/unprocessed_transaction_storage.rs Outdated Show resolved Hide resolved
core/src/banking_stage/packet_receiver.rs Outdated Show resolved Hide resolved
TransactionError::AccountNotFound
| TransactionError::InsufficientFundsForFee
| TransactionError::InvalidAccountForFee => {
self.invalid_fee_payer_filter.add(*tx.message().fee_payer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the only instance😭 Maybe it's better to swap that error with InsufficientFunds or rename it to FeePayerAccoutNotFound so we don't lose the distinction

could. separate pr tho

@apfitzge apfitzge requested a review from t-nelson September 7, 2023 22:10
@@ -50,7 +56,7 @@ impl PacketReceiver {
recv_timeout,
unprocessed_transaction_storage.max_receive_size(),
)
// Consumes results if Ok, otherwise we keep the Err
// Consumes results if Ok, otherwise we keep the Err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo?


#[derive(Default)]
pub struct InvalidFeePayerFilter {
recent_invalid_fee_payers: DashSet<Pubkey>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like bloom should be preferable in that case due to constant size?

thoughts?

@apfitzge apfitzge force-pushed the invalid_fee_payer_filter branch from 2fef39b to afe8dd8 Compare September 20, 2023 16:46
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 5, 2023
@github-actions github-actions bot closed this Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants