-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: chain upgrade procedure, for operators #4097
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Performing chain upgrades | ||
|
||
When consensus-breaking changes are made to the Penumbra protocol, | ||
node operators must coordinate upgrading to the new version of the software | ||
at the same time. Penumbra uses a governance proposal for scheduling upgrades | ||
at a specific block height. | ||
|
||
## Upgrade process abstractly | ||
|
||
At a high level, the upgrade process consists of the following steps: | ||
|
||
1. Governance proposal submitted, specifying explicit chain height `n` for halt to occur. | ||
2. Governance proposal passes. | ||
3. Chain reaches specified height `n-1`, nodes stop generating blocks. | ||
4. Manual upgrade is performed on each validator and fullnode: | ||
1. Prepare migration directory via `pd export`. | ||
2. Install the new version of pd. | ||
3. Apply changes to node state via `pd migrate`. | ||
4. Copy a few files and directories around, clean up CometBFT state. | ||
5. Restart node. | ||
|
||
After the node is restarted on the new version, it should be able to talk to the network again. | ||
Once enough validators with sufficient stake weight have upgraded, the network | ||
will resume generating blocks. | ||
|
||
|
||
## Genesis time | ||
|
||
In order for the chain to start again after the upgrade, all nodes must be using the same genesis information, | ||
including the timestamp for the genesis event. While the `pd migrate` command will create a new `genesis.json` file, | ||
it cannot know the correct genesis start time to use without the operator supplying the `--genesis-start` flag. | ||
The community may choose to specify a start time within the upgrade proposal. If so, all operators must use that value | ||
when performing the migration, as described below. Otherwise, validators must coordinate out of band to agree | ||
on a genesis start time. | ||
|
||
Leveraging the governance proposal is the recommended way to solve this problem. If the genesis start time is a value | ||
in the future, then after the upgrade is performed, the node will start, but not process blocks. It will wait | ||
until the `--genesis-start` time is reached, at which point it will resume processing blocks. In this way, | ||
the community of validators can coordinate resumption of chain activity, even when operators perform migrate their ndoes | ||
at slightly different times. | ||
|
||
## Performing a chain upgrade | ||
|
||
The following steps assume that `pd` is using the default home directory of `~/.penumbra/testnet_data/node0/pd`. | ||
If your instance is using a different directory, update the paths accordingly. | ||
|
||
1. Stop both `pd` and `cometbft`. Depending on how you run Penumbra, this could mean `sudo systemctl stop penumbra cometbft`. | ||
2. Using the same version of `pd` that was running when the chain halted, prepare an export directory: | ||
`pd export --home ~/.penumbra/testnet_data/node0/pd --export-directory ~/.penumbra/testnet_data/node0/pd-exported-state` | ||
3. Back up the historical state directory: `mv ~/.penumbra/testnet_data/node0/pd ~/.penumbra/testnet_data/node0/pd-state-backup` | ||
4. Download the latest version of `pd` and install it. Run `pd --version` and confirm you see `{{ #include ../penumbra_version.md }}` before proceeding. | ||
|
||
<!-- | ||
An example log message emitted by `pd migrate` without providing `--genesis-start`: | ||
|
||
pd::upgrade: no genesis time provided, detecting a testing setup now=2023-12-09T00:08:24.225277473Z` | ||
|
||
The value after `now=` is what should be copied. In practice, for testnets, Penumbra Labs will advise on a genesis time | ||
and provide that value in the documentation. Or should we just pick a genesis start ahead of time, and use that for all? | ||
--> | ||
5. Apply the migration: `pd migrate --genesis-start "GENESIS_TIME" --target-directory ~/.penumbra/testnet_net/node0/pd-exported-state/ --migrate-archive ~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz`. | ||
Replace `GENESIS_TIME` with the exact string: `XXXXX`. | ||
6. Move the migrated state into place: `mkdir ~/.penumbra/testnet_data/node0/pd && mv ~/.penumbra/testnet_data/node0/pd-exported-state/rocksdb ~/.penumbra/testnet_data/node0/pd/` | ||
7. Move the upgrade cometbft state into place: `cp ~/.penumbra/testnet_data/node0/pd-exported-state/genesis.json ~/.penumbra/testnet_data/node0/cometbft/config/genesis.json | ||
&& cp ~/.penumbra/testnet_data/pd-exported-state/priv_validator_state.json ~/.penumbra/testnet_data/node0/cometbft/data/priv_validator_state.json` | ||
8. Then we clean up the old CometBFT state: `find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +` | ||
|
||
Finally, restart the node, e.g. `sudo systemctl restart penumbra cometbft`. Check the logs, and you should see the chain progressing | ||
past the halt height `n`. | ||
|
||
If you want to host a snapshot for this migration, copy the file | ||
`~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz` to the appropriate hosting environment, | ||
and inform the users of your validator. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would add a section about genesis time and explain that this is decided by validators/node runners, possibly via the governance proposal itself (non-binding, opt-in). And describe how genesis time works: if it is set in the future, the node will start and wait for the time of genesis to be reached before it can start producing blocks (and peering?).