Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IF: Command in leap-util to reset persisted safety data for finalizer to LIB #1576

Closed
arhag opened this issue Aug 29, 2023 · 1 comment
Closed

Comments

@arhag
Copy link
Member

arhag commented Aug 29, 2023

Ultimately, we want to persist the reversible blocks database in a smarter way (along with the QC chains) so that enough data is available post crash to safely recover liveness without losing any blocks. That is the plan as further enhancement to Leap post 5.0

For Leap 5.0, we only durably store the minimal information for a finalize machine to remain safe after a nodeos process crash (see #1521). It then relies on the other nodes in the network having enough information to allow it to safely recover to the point where it can start participating in voting as part of the HotStuff algorithm and contribute to liveness.

But if enough finalizers in the network suddenly crash around the same time, they may lose important liveness data (highest QC, reversible blocks up _b_lock) that prevents them from even collectively working together to safely recover liveness for the network. In this case, the blockchain can keep producing reversible blocks but LIB would not advance.

To recover from such an extreme situation prior to post 5.0 enhancements, we need a backup mechanism that allows a finalizer to compromise their safety protections for the greater goal of allowing the network to recover liveness. This mechanism is also useful in the case where the finalizer loses or accidentally deletes the file in the blocks/reversible directory that persists the information needed to protect their safety; note that nodeos will attempt to "fail safe" if starting up with that file missing which comes at the cost of liveness.

This backup mechanism should be provided as sub-commands within a finality command in the leap-util program. First, there should be a sub-command to simply explore the entries in the persisted file. Perhaps there should be another sub-command to delete an entry referenced by the BLS finalzier public key from within the persisted file. And, more pertinent to this issue, there should be a sub-command that (re)sets the entry associated to the specified BLS finalizer public key to set _vheight and _b_lock information within the entry as if the last irreversible block was the _b_lock block and the _vheight was the block height of the last irreversible block and a phase counter of 2. This makes it so that the node will be able to immediately participate in the finality consensus process with block proposals built directly off the last irreversible block (which enough nodes must have durably persisted to disk), thus enabling liveness but at the risk of safety.

@arhag
Copy link
Member Author

arhag commented Feb 23, 2024

Overcome by events.

Now the finalizer can just delete the finalizer safety information file.

@arhag arhag closed this as not planned Won't fix, can't repro, duplicate, stale Feb 23, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Team Backlog Feb 23, 2024
@arhag arhag removed this from the Leap v6.0.0-rc1 milestone Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants