-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent DOS vector introduced by throttling #594
Comments
My hunch is that the second solution would be a lot easier to implement and wrap our heads around. Seems like it could have less edge cases as well. |
IIUC, this triggered in |
I would suggest to first add validation for 1-3 since it's straightforward. Regarding 4, it only requires a reverse index that should be pruned when the entry is removed from the queue, i.e.,
The infraction type is needed since the provider shouldn't accept
|
This led me think to the following edge case:
This behavior is different than the behavior w/o jail throttling where |
This would handled packets in a different order than the one they were received, and thus break the assumption of an order CCV channel. |
This sounds good to me since there's issues open, 4 is also not crazy challenging.
Valid edge case 👍 this is indeed a change in behavior. But, imo the new (throttling introduced) behavior doesn't cause any issues. For all of these convos, I'd prefer we implement them as their own PRs into main after the large throttling branch is merged. |
Hold on- Why do we need to fix this? The slash throttle is ultimately intended to give validators time to halt the chain in an attack, this "problem" just lets the attacker do it for us. @mpoke @smarshall-spitzbart |
If our goal was to automatically halt the chain when too many slash packets come in, the throttling PR could have been way less complex. This issue focuses on throwing out slash packets where it's possible to avoid automatic halts (w/o affecting throttling) allowing validators to act as they see fit during an attack (which may be a halt most of the time, or just a swift consumer removal?). |
The idea is that slash throttling could enable the censorship of SlashPackets. All you need is a malicious consumer, send many SlashPackets so no other consumer would get their SlashPackets in. This is due to the fact that the provider panics if a queue is exceeding the max size. |
Not just this, but if too many slash packets are persisted in state, binaries could start to see unexpected behavior and start to halt in a non-deterministic manner. That's why the explicit, deterministic halt was implemented |
CC @MSalopek once the |
Replying to @mpoke's comment
Correct me if I'm wrong, but an ordered channel only guarantees the order in which packets are recv is the same as the order they were sent. We can handle things is whatever order is appropriate for the designed protocol. Therefore, an adaptation of (just brainstorming here, as Note that throttling already breaks |
From an IBC perspective, we can handle things in whatever way. However, ICS relies on the Channel Order property. This entails that the packets MUST be handled in the same order they were sent. Figuring out whether it's safe to handle slash packets out of order needs careful analysis.
Indeed. I assume that's why the diff testing for throttling is not working yet. |
Regarding this issue, I don't think we should panic the provider when the queues are exceeding a certain size. Just remove the consumer and cleanup the corresponding state. |
We'll need to update the language for the channel order property then, it current says:
I agree that we'll need to carefully analyze solutions to this issue |
Discussion for after holidays: remove panic on provider when queues get too large vs making the max queue size 10k instead of 1k |
I was just thinking about reporting this potential problem, when I saw you already discussed it at length here :) I am wondering, why is this a solution? Couldn't a combination of a broken/malicious consumer and a collaborating relayer cause the panicking even with this increased size? |
Hi Ivan, it's correct that a malicious consumer is still able to halt the provider if this scenario were to play out. However, without the throttling mechanism, safety could be violated by applying each slash packet immediately. The decision to go with the current design is to sacrifice liveness in an edge case scenario for the sake of security. See #713 for the seemingly best solution to this issue |
@smarshall-spitzbart , on the second thought: it can't halt the chain after all. The panic happens inside In the case of deletion, it will not panic because it had to be larger than the max size of the queue even before deletion. |
Ah interesting, if a transaction panics the binary then the chain will still produce blocks, and also accept other transactions in subsequent blocks? |
Yes, because the panic would be caught by this |
Problem
Inspired by slack convo with @mpoke, if an attacker first causes the slash meter to go negative with some initial "valid" slash packets, then sends a bunch more slash packets, we have a scenario where the packet queues could grow very large over multiple blocks until provider binaries panic. This scenario would not panic the provider WITHOUT throttling, because the spam packets would be handled/dropped immediately upon being recv.
Closing criteria
Solve the attack vector with a patch to throttling, or deem it as not needed.
Problem details
A malicious consumer can easily send 10000 "valid" or "invalid" packets after causing the meter to go negative, where we can't assert logic about validity unless we inspect the queues upon every recv slash packet. Even if we did add logic where we inspect queues, 10000 valid downtime slash packets in a row that're all duplicates would still be valid with how we define it. Eventually we will hit this logic which would halt the provider.
I see two potential solutions to this issue:
More validation logic
Add validation logic to
ValidateSlashPacket
such that the following conditions result in the packet being thrown out before it is queued.:data.Validator
has an unfound or unbonded validator according to the provider, this is Return IBC err ack when val not found for slash packet #546data.ValsetUpdateId
is valid in that it maps to a persisted infraction height, and a VSC matured packet has not already been recv for this VSCId (see OutdatedSlashPackets
are not dropped #544)Don't stop iteration through queue
On endblocker, instead of stopping iteration through the global queue at the instance that the slash meter goes negative, we keep iterating through all global queue entries. Any entry which would result in no voting power changes (val already jailed, tombstoned, not found, etc.) would be handled, no matter the value of the slash meter. Global entries which do affect voting power (and are therefore fully valid) would act as before, and be throttled by slash meter value. This solution would require changes to the current throttling implementation to make sure that VSC Matured packets in the chain specific queue are still handled only after all other slash packets for that chain have been handled.
The text was updated successfully, but these errors were encountered: