From 6897901b37f5c72d4b5c593167f0e0b933dd2eb5 Mon Sep 17 00:00:00 2001 From: Conor Schaefer Date: Mon, 25 Mar 2024 08:31:10 -0700 Subject: [PATCH] docs: chain upgrade procedure, for operators Adds documentation for the specific steps to be performed by a node operator in order to participate in a chain upgrade. These docs are largely based on the existing wiki notes [0], adapted for a generalized Penumbra setup. Similarly, sketched out some corresponding changes to the `pd testnet join` docs, but left them commented out for now: those docs are only relevant for joining a chain that has already been upgraded. If things go well with #4087, we'll uncomment those docs and start using them. Finishes and therefore closes #3841. [0] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades --- docs/guide/src/SUMMARY.md | 1 + docs/guide/src/pd/chain-upgrade.md | 73 ++++++++++++++++++++++++++++++ docs/guide/src/pd/join-testnet.md | 23 ++++++++++ 3 files changed, 97 insertions(+) create mode 100644 docs/guide/src/pd/chain-upgrade.md diff --git a/docs/guide/src/SUMMARY.md b/docs/guide/src/SUMMARY.md index 8d2f28386f..dc9fce7f5d 100644 --- a/docs/guide/src/SUMMARY.md +++ b/docs/guide/src/SUMMARY.md @@ -12,6 +12,7 @@ - [Installing `pd`](./pd/install.md) - [Joining a testnet](./pd/join-testnet.md) - [Becoming a validator](./pd/validator.md) + - [Performing a chain upgrade](./pd/chain-upgrade.md) - [Debugging](./pd/debugging.md) - [Local RPC with `pclientd`](./pclientd.md) - [Configuring `pclientd`](./pclientd/configure.md) diff --git a/docs/guide/src/pd/chain-upgrade.md b/docs/guide/src/pd/chain-upgrade.md new file mode 100644 index 0000000000..da6233343d --- /dev/null +++ b/docs/guide/src/pd/chain-upgrade.md @@ -0,0 +1,73 @@ +# Performing chain upgrades + +When consensus-breaking changes are made to the Penumbra protocol, +node operators must coordinate upgrading to the new version of the software +at the same time. Penumbra uses a governance proposal for scheduling upgrades +at a specific block height. + +## Upgrade process abstractly + +At a high level, the upgrade process consists of the following steps: + +1. Governance proposal submitted, specifying explicit chain height `n` for halt to occur. +2. Governance proposal passes. +3. Chain reaches specified height `n-1`, nodes stop generating blocks. +4. Manual upgrade is performed on each validator and fullnode: + 1. Prepare migration directory via `pd export`. + 2. Install the new version of pd. + 3. Apply changes to node state via `pd migrate`. + 4. Copy a few files and directories around, clean up CometBFT state. + 5. Restart node. + +After the node is restarted on the new version, it should be able to talk to the network again. +Once enough validators with sufficient stake weight have upgraded, the network +will resume generating blocks. + + +## Genesis time + +In order for the chain to start again after the upgrade, all nodes must be using the same genesis information, +including the timestamp for the genesis event. While the `pd migrate` command will create a new `genesis.json` file, +it cannot know the correct genesis start time to use without the operator supplying the `--genesis-start` flag. +The community may choose to specify a start time within the upgrade proposal. If so, all operators must use that value +when performing the migration, as described below. Otherwise, validators must coordinate out of band to agree +on a genesis start time. + +Leveraging the governance proposal is the recommended way to solve this problem. If the genesis start time is a value +in the future, then after the upgrade is performed, the node will start, but not process blocks. It will wait +until the `--genesis-start` time is reached, at which point it will resume processing blocks. In this way, +the community of validators can coordinate resumption of chain activity, even when operators perform migrate their ndoes +at slightly different times. + +## Performing a chain upgrade + +The following steps assume that `pd` is using the default home directory of `~/.penumbra/testnet_data/node0/pd`. +If your instance is using a different directory, update the paths accordingly. + +1. Stop both `pd` and `cometbft`. Depending on how you run Penumbra, this could mean `sudo systemctl stop penumbra cometbft`. +2. Using the same version of `pd` that was running when the chain halted, prepare an export directory: + `pd export --home ~/.penumbra/testnet_data/node0/pd --export-directory ~/.penumbra/testnet_data/node0/pd-exported-state` +3. Back up the historical state directory: `mv ~/.penumbra/testnet_data/node0/pd ~/.penumbra/testnet_data/node0/pd-state-backup` +4. Download the latest version of `pd` and install it. Run `pd --version` and confirm you see `{{ #include ../penumbra_version.md }}` before proceeding. + + +5. Apply the migration: `pd migrate --genesis-start "GENESIS_TIME" --target-directory ~/.penumbra/testnet_net/node0/pd-exported-state/ --migrate-archive ~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz`. + Replace `GENESIS_TIME` with the exact string: `XXXXX`. +6. Move the migrated state into place: `mkdir ~/.penumbra/testnet_data/node0/pd && mv ~/.penumbra/testnet_data/node0/pd-exported-state/rocksdb ~/.penumbra/testnet_data/node0/pd/` +7. Move the upgrade cometbft state into place: `cp ~/.penumbra/testnet_data/node0/pd-exported-state/genesis.json ~/.penumbra/testnet_data/node0/cometbft/config/genesis.json + && cp ~/.penumbra/testnet_data/pd-exported-state/priv_validator_state.json ~/.penumbra/testnet_data/node0/cometbft/data/priv_validator_state.json` +8. Then we clean up the old CometBFT state: `find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +` + +Finally, restart the node, e.g. `sudo systemctl restart penumbra cometbft`. Check the logs, and you should see the chain progressing +past the halt height `n`. + +If you want to host a snapshot for this migration, copy the file +`~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz` to the appropriate hosting environment, +and inform the users of your validator. diff --git a/docs/guide/src/pd/join-testnet.md b/docs/guide/src/pd/join-testnet.md index ef5a38462e..1c42cad5e8 100644 --- a/docs/guide/src/pd/join-testnet.md +++ b/docs/guide/src/pd/join-testnet.md @@ -30,6 +30,26 @@ This will delete the entire testnet data directory. Next, generate a set of configs for the current testnet: + + ```shell pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME ``` @@ -37,6 +57,9 @@ pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME where `IP_ADDRESS` (like `1.2.3.4`) is the public IP address of the node you're running, and `MY_NODE_NAME` is a moniker identifying your node. Other peers will try to connect to your node over port `26656/TCP`. + If your node is behind a firewall or not publicly routable for some other reason, skip the `--external-address` flag, so that other peers won't try to connect to it.