Witness threshold (TOAD) selection in witness pools and KAWA #837
SmithSamuelM
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Relationship of F and F*
see Keri WP 2.62 or later
It can be shown that for any set of N witnesses, there is a threshold M < N that guarantees that at most one sufficient agreement occurs or none at all despite a dishonest controller but where at most F* = N-M of the witnesses are potentially unavailable and at most F < M is duplicitous.
The upper and lower constraints on M may be combined as follows:
(N+F+1)/2 <= M <= N-F*
With the inequality above, there may be more than one solution for M for any given N and F. In that case, we can have an effective number of unavailable witnesses denoted F* which is given by
F* = N - M
There are two types of faults we care about with regard to F and F*. One type is unavailability, in which an otherwise honest witness is not available to publish a receipt for a first-seen event. The other type is duplicity, in which a witness publishes a receipt for more than one version of an event given more than one version of an event has been produced by a duplicitous or compromised controller. The inequality means that while no more than F witnesses are allowed to be duplicitous, in some cases F* ≥ F honest witnesses may be unavailable.
Satisfaction of this constraint guarantees that at most one sufficient agreement occurs or none at all despite a dishonest controller but where at most F* of the witnesses are potentially unavailable and at most F are duplicitous. This guarantee means that the agreement is deemed immune. To elaborate, given at most F* potentially unavailable or F potentially duplicitous witnesses, an immune agreement requires that M be a sufficient majority of N and guarantees as a result that the service may either only produce a sufficient agreement for one version of each event or none at all despite a dishonest or exploited controller.
This guarantee means that the agreement is deemed immune (from failure due to faulty F or F*). To elaborate, given at most F* potentially unavailable or F potentially duplicitous witnesses, an immune agreement requires that M be a sufficient majority of N and guarantees as a result that the service may either only produce a sufficient agreement for one version of each event or none at all despite a dishonest or exploited controller. The following table provides values of N, M, F, and F* that satisfy this immunity constraint.
Given the immune constraint is satisfied, the service may not produce multiple divergent but proper KERL). In order to be deemed proper, an agreement MUST have been verified as consistent with all prior events by every non-faulty witness who is a party to that agreement. Thus, any user of the service, be it a validator, watcher, juror, or judge, will be able to obtain either a proper event agreement on demand from some witness or none at all. Any non-faulty witness with a proper agreement will keep that agreement in its KERL and provide it on demand. Consequently, the availability of a proper event at a witness is tantamount to the availability of a proper log (KERL) of all prior events consistent with that event at that witness, and thereby, high availability of the service is assured.
A duplicitous (maliciously dishonest ) witness could witness more than one version of the same event. This means that effectively the size of N is exanded. Suppose that N is 7 and M is 5 and F is 2 then when "voting" the two dishonest witnesses each vote to two different versions of the event this means that a total of 9 votes would be cast. 5 votes by the honest witnesses and 4 votes by the two dishonest witnesses. This means that 5 is still sufficient to guarantee that only one version can satisfy the threshold given at most two dishonest witnesses. But I could expand my witness pool N to 8 with a threhold of 5 by allowing at most a total of 3 faulty witnesses where 3 were faulty but unavailable not dishonest. This is F*. So the controller determines what level and types of faults it wants to be responsible or held accountable for. The algorithm tells the controller and validators what the possibilities are given the N and M. It doesn't guarantee that they won't be more than F or F* faults, merely that at most, F or F* faults could occur and still guarantee agreement.
We can also consider combinations of F and F*. Suppose at any given point in time we have at most 1 F dishonest witnesses and at most 2 F* unavailable witnesses, but we expand to N =8 while keeping M = 5. Now, the worst cases would be
2, The 1 F dishonest witness votes twice and two of the 7 honest witnesses are unavailable. So we have 6 votes, (5 available honest and 2 from the one dishonest) So we have either 5 votes that are in agreement or no agreement at all.
Recall, that the guarantee for a given N, M, F or F* is that either there is one agreement or no agreement at all never multiple agreements.
Now, with a pool of 8 and a threshold of 5, I can't have two dishonest witnesses at the same time and still satisfy the guarantee because the number of votes, in the worst case, would be 10 with 6 honest + 4 (2 dishonest voting twice each) and 5 is no longer a majority.
The formula just informs the controller and validators what types and levels of faults F or F* are allowable given N and M.
Although, a Validator may hold the controller ultimately accountable for any duplicity of its witnesses. a Validator on a high stakes event may choose to watch enough witnesses to guarantee the actual level of duplicity satisfies the validators trust level before making a trust decision.
KAWA and the element of time when arriving at agreement.
As the white paper points out, KAWA does not have a liveness guarantee. Which means there is no time bound driving agreement. A validator either trusts or does not yet trust. Not yet could go on forever. A controller may recover by doing a rotation that replaces witnesses.
What I found when reviewing the state of the art with respect to distributed consensus algorithms and shared ledgers is little to no consideration of key management faults and how to mitigate and protect those. This is why I invented KERI and duplicity evident systems and algorithms (KAWA), so that key management faults are elevated to a primary fault we care about and duplicity evidence is the primary trigger for mitigation via rotation recovery and the primary trigger for protection by not trusting, ile withholding trust decision until the duplicty has been reconciled.
You can find some good discussion by Mazieres in his write ups and presentations on the Stellar algorithm why one may want to have different Safeness vs Liveness margins and guarantees in consensus. I.e making tradeoffs between the two. From that I realized that for trust decisions, often the best policy is to not trust, i.e the margin is 100% safety and 0% liveness. I.e never make a trade for liveness at the cost of safety.
Example
An honest witness can only see one version of a key event so it will only report that version and attach receipts from other witnesses to that event. If an event first seen by a witness never gets enough reciepts to reach the threshold then there is no agreement as far as that witnesses reporting of the event is concerned. But the witness only validates that the controller signatures are valid and if there is an authentication factor for receiving an event from a controller the witness also verifies that before "seeing" the event. Once seen the witness witnesses it regardless of what other witnesses have done or may do. The "consensus" is only with respect to what an external validator of the event sees. Should an unavailable witness later become available and provide a receipt, then that event may then at that later time show agreement.
So, for example, suppose a validator receives an event from a witness with attached receipts. The validator can verify: 1) the controller signatures satisfy the controller threshold for signing against the current key list at the sn of the event. The witness receipts verify against the witness keys. The number of receipts satisfies or does not meet the threshold for witnessing.
If the receipts do not yet satisfy the TOAD (threshold of accountable duplicity or witness pool threshold) then the validator does not yet trust the event it does not accept it into its copy of the controller's KEL. The event is held in escrow. The validator is thereby prevented from making a trust decision with respect to any sealed data in that event.
Several possibilities for the future incur.
At some time in the future, the event becomes fully receipted, i.e., the validator may eventually receive enough receipts that the event satisfies the threshold, and the validator accepts it into its copy of the controller's KEL, thereby enabling a trust decision.
The event is never fully receipted so the validator never makes a trust decision
The validator detects duplicity with respect to the event and therefore decides not to trust until the duplicity is reconciled to the validators satisfaction.
The controller determines that the witness pool is sufficiently faultly that it does a rotation recovery and diputes the event as well as reconstituting the witness pool. So a new event that anchors the same trust decision data eventually is fully receipted and propagates to the validator so the trust decison is able to proceed albeit with a different event. Since the original event never was accepted.
So how might duplicity exhibit.
First to keep it simple, Lets assume the controller is honest. The controller picks a witness threhold. Recall that the validator never "sees" the event until it is fully receipted, i.e., the number of witness receipts satisfies the threshold. So there is no "seeable" duplicity (in the strictest sense) wrt the validator for any event where receipts are below the threshold. But as soon as a validator first sees an event, it can't see any other version of that event. So it can't "see" any duplicity. Therefore for a validator to detect (not "see") duplicity it would have or act as a watcher that is watching for duplicity. I.e., was keeping track of events in addition to those it first saw that are duplicitous. A watcher who is doing this is called a Juror.
So a juror can detect (not "see") duplicity and then report that duplicity.
In the case of a witness pool, minimal duplicity is exhibited via an alternate version of the event that is witnessed by any witness. Ie.e some subset of witnesses witness an event that is a different version of that event as witnessed by any other subset of witnesses.
When duplicity is below the threshold, validators and controllers are protected because no validator will "see" duplicity below the threshold. This makes witnessing fault-tolerant. However, a controller can still detect it and take action to mitigate it even when it is below the threshold. Likewise, a validator may decide not to ignore duplicity and be proactive even when it is below the threshold.
When the duplicity is above the threshold then bad stuff can happen. This means that two versions of the same event are fully witnessed from a validator's standpoint.
In a two way transaction between a controller and validator, the version first seen by the validator wins. If the first version seen by the validator is the uncompromized version then no harm no fault, but the controller can detect it and repair the fault for the future without impinging on the validator.
If the first seen is the compromised version then the controller is accountable for any trust decision the validator makes as a result. The controller can still repair the fault by doing a rotation recovery but would have to get the validator to also redo the trust decision with a new transaction.
Also problematic is a three-way transaction in which two different validators "see" two different versions of sealed data. They can still protect each other by exchanging copies of KELs before making trust decisions. But if they don't, they could be induced to make errant trust decisions.
Function of the TOAD
The threshold performs several functions:
So picking N, M, F, and F* impacts to varying degrees those four functions.
For example a witness pool of size N=4 with TOAD M=3 has a fault margin of 1 for dishonest witnesses and a fault margin of 1 for unavailable witnesses. This does not mean that duplicity is prevented it just defines the margins, how many faults could occur and still make guarantees.
Suppose instead that for a pool size N-4 the toad M=1. This means that there is no margin of safety for a malicious witness. Even one malicious witness can produce a alternate event that is fully witnessed and satisfies the threshold. But the available margin is 3. The pool can have up to 3 honest but unavailable witnesses and events can still be fully witnessed. So high availbility but zero fault tolerance to compromised witnesses. A validator making a trust decision may not want to engage in that case. Or the risk of the type of transaction may say that it is fine. For example, if the controller is using multisig, and has witnesses with independent authentication factors for accepting new events from controllers. Then the risk of a compromised witness may be so low that zero fault tolerance is acceptable.
An EGF might specify constraints on allowble combinations of N, M, F, and F*.
A typical distributed consensus algorithm with a liveness guarantee is duplicity hiding. KAWA preserves the critical element of duplicity evident.
What I mean by duplicity hiding is that one version of an event that is correctly signed will always be guaranteed within a time bound be agreed upon (subject to the fault constraint). Two correctly signed but different events are duplicity. But the ledger only ever provides one.
This means a Validator never sees that there ever were alternate versions of that same event that were being voted upon but were discarded. Likewise a controller posting an event may never see that its keys have been compromised unless and until the compromised version of the event wins the voting and is the once accepted. By then it is too late.
Now from a systems design perspective how likely is F versus F*. F is highly unlikely whlle F* is highly likely. So NOT informing the users that F* is a design variable might be unsatisfying. Nonetheless, F includes F* as a subset, just with a more stringent threshold, so assuming that F* never exceeds F is a conservative design constraint but may be too conservative.
In order for a duplicitous witness fault to occur two compromises must have happened. A sufficient majority of the Controller's signing keys must be compromised and a given witness authenticaton factor with respect to accepting events from its controller must have also been compromised. Now a malicous attacker can publish a verifiable receipt for an alternate version of an event.
In order for an unavailable witness fault to occur, no key compromise is needed, the witness just has to be offline.
There are two types of unavailable faults, the first is intermittent and the second is permanent. In the first case when the witness comes back on line it repairs its own fault, recovery doesn't require any action by the controller. But in the seoond case the witness never comes back online or is offline long enough that the controller must take action and replace the witness with a new witness that works. The second fault could occur becasue of location, legal, or fiscal issues. Like a witness host goes out of business and the private key goes with it. So there is no way to bring it back on line. Or the private key gets lost and is only ever in memory so when the witness is rebooted the witness becomes permanently inable to witness any receipts.
So a transaction between a controller and some validator that depends on the validator accepting the latest key state, will be delayed indifinately if two many witnesses are permanently offline, or delayed long enough that the validator gives up if too many witnesses are temporarily offline during the time frame of the transaction. So F* may matter as a design constraint.
Example
So a pool of 5 with a threshold of 4 has an F of 1 and an F* of 1. Well a pool of 4 with a threshold of 3 also has an F of 1 and and F* of 1. So no more fault tolerance as the cost of an additional witness. So 5 is not a good example of a pool size. For a pool of 5, 3 is not a good threshold as F must be zero to guarantee agreement.
Whereas a pool size of 6 can have an F of 1 or an F * of 2. with a threshold of 4. So it gains some more Fault tolerance in return for more nodes.
Beta Was this translation helpful? Give feedback.
All reactions