Skip to content

Commit

Permalink
docs: chain upgrade procedure, for operators
Browse files Browse the repository at this point in the history
Adds documentation for the specific steps to be performed by a node
operator in order to participate in a chain upgrade. These docs are
largely based on the existing wiki notes [0], adapted for a generalized
Penumbra setup.

Similarly, sketched out some corresponding changes to the `pd testnet
join` docs, but left them commented out for now: those docs are only
relevant for joining a chain that has already been upgraded. If things
go well with #4087, we'll uncomment those docs and start using them.

Finishes and therefore closes #3841.

[0] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
  • Loading branch information
conorsch committed Mar 26, 2024
1 parent 5474c9e commit 26544e0
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Installing `pd`](./pd/install.md)
- [Joining a testnet](./pd/join-testnet.md)
- [Becoming a validator](./pd/validator.md)
- [Performing a chain upgrade](./pd/chain-upgrade.md)
- [Debugging](./pd/debugging.md)
- [Local RPC with `pclientd`](./pclientd.md)
- [Configuring `pclientd`](./pclientd/configure.md)
Expand Down
73 changes: 73 additions & 0 deletions docs/guide/src/pd/chain-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Performing chain upgrades

When consensus-breaking changes are made to the Penumbra protocol,
node operators must coordinate upgrading to the new version of the software
at the same time. Penumbra uses a governance proposal for scheduling upgrades
at a specific block height.

## Upgrade process abstractly

At a high level, the upgrade process consists of the following steps:

1. Governance proposal submitted, specifying explicit chain height `n` for halt to occur.
2. Governance proposal passes.
3. Chain reaches specified height `n-1`, nodes stop generating blocks.
4. Manual upgrade is performed on each validator and fullnode:
1. Prepare migration directory via `pd export`.
2. Install the new version of pd.
3. Apply changes to node state via `pd migrate`.
4. Copy a few files and directories around, clean up CometBFT state.
5. Restart node.

After the node is restarted on the new version, it should be able to talk to the network again.
Once enough validators with sufficient stake weight have upgraded, the network
will resume generating blocks.


## Genesis time

In order for the chain to start again after the upgrade, all nodes must be using the same genesis information,
including the timestamp for the genesis event. While the `pd migrate` command will create a new `genesis.json` file,
it cannot know the correct genesis start time to use without the operator supplying the `--genesis-start` flag.
The community may choose to specify a start time within the upgrade proposal. If so, all operators must use that value
when performing the migration, as described below. Otherwise, validators must coordinate out of band to agree
on a genesis start time.

Leveraging the governance proposal is the recommended way to solve this problem. If the genesis start time is a value
in the future, then after the upgrade is performed, the node will start, but not process blocks. It will wait
until the `--genesis-start` time is reached, at which point it will resume processing blocks. In this way,
the community of validators can coordinate resumption of chain activity, even when operators perform migrate their ndoes
at slightly different times.

## Performing a chain upgrade

The following steps assume that `pd` is using the default home directory of `~/.penumbra/testnet_data/node0/pd`.
If your instance is using a different directory, update the paths accordingly.

1. Stop both `pd` and `cometbft`. Depending on how you run Penumbra, this could mean `sudo systemctl stop penumbra cometbft`.
2. Using the same version of `pd` that was running when the chain halted, prepare an export directory:
`pd export --home ~/.penumbra/testnet_data/node0/pd --export-directory ~/.penumbra/testnet_data/node0/pd-exported-state`
3. Back up the historical state directory: `mv ~/.penumbra/testnet_data/node0/pd ~/.penumbra/testnet_data/node0/pd-state-backup`
4. Download the latest version of `pd` and install it. Run `pd --version` and confirm you see `{{ #include ../penumbra_version.md }}` before proceeding.

<!--
An example log message emitted by `pd migrate` without providing `--genesis-start`:
pd::upgrade: no genesis time provided, detecting a testing setup now=2023-12-09T00:08:24.225277473Z`
The value after `now=` is what should be copied. In practice, for testnets, Penumbra Labs will advise on a genesis time
and provide that value in the documentation. Or should we just pick a genesis start ahead of time, and use that for all?
-->
5. Apply the migration: `pd migrate --genesis-start "GENESIS_TIME" --target-directory ~/.penumbra/testnet_net/node0/pd-exported-state/ --migrate-archive ~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz`.
Replace `GENESIS_TIME` with the exact string: `XXXXX`.
6. Move the migrated state into place: `mkdir ~/.penumbra/testnet_data/node0/pd && mv ~/.penumbra/testnet_data/node0/pd-exported-state/rocksdb ~/.penumbra/testnet_data/node0/pd/`
7. Move the upgrade cometbft state into place: `cp ~/.penumbra/testnet_data/node0/pd-exported-state/genesis.json ~/.penumbra/testnet_data/node0/cometbft/config/genesis.json
&& cp ~/.penumbra/testnet_data/pd-exported-state/priv_validator_state.json ~/.penumbra/testnet_data/node0/cometbft/data/priv_validator_state.json`
8. Then we clean up the old CometBFT state: `find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +`

Finally, restart the node, e.g. `sudo systemctl restart penumbra cometbft`. Check the logs, and you should see the chain progressing
past the halt height `n`.

If you want to host a snapshot for this migration, copy the file
`~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz` to the appropriate hosting environment,
and inform the users of your validator.
23 changes: 23 additions & 0 deletions docs/guide/src/pd/join-testnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,36 @@ This will delete the entire testnet data directory.

Next, generate a set of configs for the current testnet:

<!--
### Begin join customization
The following section describes how to join a testnet chain *which has never upgraded*.
Once a chain upgrade occurs, a new-joining node must have access to an archive
of historical, migrated state. When we upgrade the chain, we should update these
docs to switch to the archive-url version:
```shell
pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME \
--archive-url "https://snapshots.penumbra.zone/testnet/pd-archived-stated-xxxxx.tar.gz
```
where `IP_ADDRESS` (like `1.2.3.4`) is the public IP address of the node you're running,
and `MY_NODE_NAME` is a moniker identifying your node. Other peers will try to connect
to your node over port `26656/TCP`. Finally, the `--archive-url` flag will fetch
a tarball of historical blocks, so that your newly joining node can understand transactions
that occurred prior to the most recent chain upgrade.
-->

```shell
pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME
```

where `IP_ADDRESS` (like `1.2.3.4`) is the public IP address of the node you're running,
and `MY_NODE_NAME` is a moniker identifying your node. Other peers will try to connect
to your node over port `26656/TCP`.
<!--
### End join customization
-->

If your node is behind a firewall or not publicly routable for some other reason,
skip the `--external-address` flag, so that other peers won't try to connect to it.
Expand Down

0 comments on commit 26544e0

Please sign in to comment.