Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Testnet 76, via chain upgrade #4402

Closed
40 tasks done
conorsch opened this issue May 16, 2024 · 10 comments
Closed
40 tasks done

Release Testnet 76, via chain upgrade #4402

conorsch opened this issue May 16, 2024 · 10 comments
Assignees
Labels
_P-high High priority _P-V1 Priority: slated for V1 release
Milestone

Comments

@conorsch
Copy link
Contributor

conorsch commented May 16, 2024

Testnet upgrade

Testnet chain id: penumbra-testnet-deimos-8
Release date: 2024-05-22
Testnet release manager: @conorsch

We're preparing another chain upgrade, explicitly to exercise the mechanics of migrations and coordination, and implicitly to ship a few changes.

Testnet Release Manager Checklist

Pre-release:

On release day:

  • Draft an announcement for peer review to ensure changes included are comprehensive.
  • Disable testnet deploy workflow, so that chain is not reset
  • Bump the version number and push its tag, via cargo-release.
    • Run cargo release minor for a new testnet, or cargo release patch for a bugfix. For the latter, make sure you're on a dedicated release branch.
    • Push the commit and newly generated tag, e.g. v0.51.0, to the remote.
  • Manually trigger container-build workflow, bc deploy workflow is disabled
  • Wait for the "Release" workflow to complete
  • Edit the newly created release object, and add a note summarizing the intent of the release
  • Close faucet (chain halt will make it inoperative anyway)
  • Run migrations on all validators
  • Run migrations on all fullnodes
  • Update Galileo deployment, following docs
  • Upload post-migration archive to https://snapshots.penumbra.zone
  • Make the announcement to Discord! 🎉🎉🎉

Post-release cleanup tasks

  • Ensure faucet is open
  • Ensure a late-joining node can join via an archive
  • Bump grpcui version for v1 reflection compatibility
  • Perform Hermes maintenance for genesis restart to get relayer running again
  • Confirm IBC channels are working
  • PR the v0.76.0 dep bump into Hermes repo
@github-project-automation github-project-automation bot moved this to Backlog in Penumbra May 16, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label May 16, 2024
@conorsch
Copy link
Contributor Author

Lower voting proposal period 24h -> 4h

Submitted today:

❯ pcli q governance proposal 0 definition
title = "lower proposal voting duration to 4h"
description = "enabling faster voting in support of upgrade testing in coming weeks"

[[parameterChange.changes]]
component = "governanceParams"
key = "proposalVotingBlocks"
value = "\"2880\""

conorsch added a commit that referenced this issue May 20, 2024
No tag associated with this increment. Making the change to the local
Cargo.toml files specifically to differentiate versions explicitly while
testing upgrades and migrations from the previous stable tag of v0.75.0.

Refs #4402.
conorsch added a commit that referenced this issue May 21, 2024
No tag associated with this increment. Making the change to the local
Cargo.toml files specifically to differentiate versions explicitly while
testing upgrades and migrations from the previous stable tag of v0.75.0.

Refs #4402.
@hdevalence
Copy link
Member

We should be sure to pull in a current snapshot of minifront cc @grod220 @turbocrime

@conorsch conorsch removed the needs-refinement unclear, incomplete, or stub issue that needs work label May 21, 2024
@aubrika aubrika moved this from Backlog to In progress in Penumbra May 22, 2024
@aubrika aubrika added this to the Sprint 7 milestone May 22, 2024
@aubrika aubrika added _P-V1 Priority: slated for V1 release _P-high High priority labels May 22, 2024
@hdevalence
Copy link
Member

We cannot do this until the proto messages erroneously added as part of #4391 are removed.

conorsch added a commit that referenced this issue May 24, 2024
Refs #4402. Due to changes in the halt-bit logic (#4413), the `--force`
flag is necessary on the migration specifically for `v0.76.0`.
conorsch added a commit that referenced this issue May 24, 2024
Refs #4402. Due to changes in the halt-bit logic (#4413), the `--force`
flag is necessary on the migration specifically for `v0.76.0`.
conorsch pushed a commit that referenced this issue May 24, 2024
Backports #4454 for inclusion in v0.76.0. Refs #4402. Closes #4457.
(cherry picked from commit e1d8b2c)
conorsch pushed a commit that referenced this issue May 24, 2024
Backports #4454 for inclusion in v0.76.0. Refs #4402. Closes #4457.
(cherry picked from commit e1d8b2c)
conorsch pushed a commit that referenced this issue May 24, 2024
Backports #4454 for inclusion in v0.76.0. Refs #4402. Closes #4457.
(cherry picked from commit e1d8b2c)
@conorsch
Copy link
Contributor Author

Prepare upgrade-plan governance proposal

❯ pcli q governance proposal 4 definition
id = "4"
title = "upgrade to 0.76.0"
description = "planned upgrade, via chain migration, to testnet 76"

[upgradePlan]
height = "222200"


❯ pcli q governance proposal 4 period
{
  "voting_start_block": 221398,
  "voting_end_block": 222123
}

@conorsch
Copy link
Contributor Author

conorsch commented May 24, 2024

❯ date -u
Fri May 24 10:14:26 PM UTC 2024

❯ pcli q governance proposal 4 state
{
  "finished": {
    "outcome": {
      "passed": {}
    }
  }
}

@conorsch
Copy link
Contributor Author

Perform Hermes
Confirm IBC channels are working

@avahowell performed the hermes maintenance and confirmed working:

Worth noting that the long migration time was concerning because we need to migrate the chain within the trusting period, which is 2h. We also overlooked updating the Hermes build deps for Penumbra v0.76.0, so had to rebuild.

@conorsch
Copy link
Contributor Author

Notes from release process: used a fully-scripted approach to apply the chain upgrades this time, to reduce chance of operator error. The command I ran was:

cd deployments/
HELM_RELEASE=penumbra-testnet TO_VERSION=v0.76.0 ./scripts/k8s-perform-

That process worked well, but was pretty slow: the script is conservative, and spent most of its run time creating backups and tar-ing up post-migration state. Testing on devnets, with minimial chain state on the order of a few hundred blocks, the script's run time was ~5m. On the actual testnet with ~200k blocks, the script's run time was 43m21s. Notably the script doesn't parallize any of the upgrades, but intentionally serializes them and bails out if any fails. Not bothering to optimize that logic now, but recording these hot takes while the info is fresh in my mind.

In the future, once we're sure the logic in the scripted approach is sound, parallelization alone would get us nearly a 10x speedup: we've got 2 vals, 3 nodes backing the RPC, 1 seed node, and 3 more solo fullnodes backing the various UI frontends (block-explorer, dex-explorer, and gov-dash, the latter unused).

Also, it's worth circling back on the disk usage of the multiple backups and state archives. If left unaddressed, those will stick around until the next chain migration, at which point they'll be clobbered. Worth considering because it means the provisioned storage for each node is now consumed by a lot more data than just the live chain state.

@conorsch
Copy link
Contributor Author

Bump grpcui version for v1 reflection compatibility

This is done, the new v1 reflection APIs are live on https://grpcui.testnet.penumbra.zone

@conorsch
Copy link
Contributor Author

Leaving galileo off, since it's failing to send txs:

May 25 00:07:09.669 ERROR galileo::responder: Failed to send funds addr=penumbra1a3afgaqz86rmh4f7p6szaygwlskwv2zur4qt2kr2z834uvjdavu9fw3sc3wjkhkf7mzvnmjwxh56t8ga98der33zyyeeqqkev8ele08j65zrg0nqf0fqvvmxhvf0af4pnjlgvu e=status: Unavailable, message: "error getting app params: missing fmd_meta_params", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

will circle back to it.

conorsch added a commit that referenced this issue May 25, 2024
erwanor pushed a commit that referenced this issue May 25, 2024
## Describe your changes
Refs #4402. Documents new archive url for testnet 76, which is required
for any nodes joining the network after the upgrade boundary. Already
confirmed I could join, peer, and stream blocks from this via localhost.

## Issue ticket number and link

## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

  > docs-only

Co-authored-by: Conor Schaefer <[email protected]>
conorsch added a commit to penumbra-zone/hermes that referenced this issue May 29, 2024
We made this change ad-hoc during deployment of Testnet 76.
Submitting it via PR to the repo to make sure we don't regress builds.
Refs penumbra-zone/penumbra#4402
@conorsch
Copy link
Contributor Author

Testnet 76 has shipped, via chain upgrade, and all the follow-up post-release tasks are complete.

@github-project-automation github-project-automation bot moved this from In progress to Done in Penumbra May 29, 2024
zbuc added a commit to penumbra-zone/hermes that referenced this issue May 29, 2024
* chore: bump penumbra deps 0.75.0 -> 0.76.0

We made this change ad-hoc during deployment of Testnet 76.
Submitting it via PR to the repo to make sure we don't regress builds.
Refs penumbra-zone/penumbra#4402

* Regen lockfile

---------

Co-authored-by: Conor Schaefer <[email protected]>
Co-authored-by: Chris Czub <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
_P-high High priority _P-V1 Priority: slated for V1 release
Projects
Archived in project
Development

No branches or pull requests

3 participants