Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd: support state tarball for joining nodes #3841

Closed
4 of 5 tasks
Tracked by #1804
conorsch opened this issue Feb 16, 2024 · 8 comments · Fixed by #4097
Closed
4 of 5 tasks
Tracked by #1804

pd: support state tarball for joining nodes #3841

conorsch opened this issue Feb 16, 2024 · 8 comments · Fixed by #4097
Assignees
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority
Milestone

Comments

@conorsch
Copy link
Contributor

conorsch commented Feb 16, 2024

When chain upgrades are performed (#1804), pd state may be collapsed by a migration, such that late-joining nodes (i.e. nodes that join the network after the upgrade boundary has passed) will not be able to verify historical state. To support late-joining nodes, we must provide the capability for pd testnet join to accept compressed archives of historical pd state, and use them during bootstrapping.

Proposal: add new optional flag --archive-url=<URL> to pd testnet join. Doing so will allow late-joining nodes to pull down a compressed archive from a remote URL, and extract that archive as starting state for pd.

Specifically, this requires:

  • Defining archive format and structure (e.g. "all files/directories should be extracted to ~/.penumbra/testnet_data/node0/pd"). My understanding is we'll need at least 1) rocks db and 2) genesis file in all cases.
  • Providing hosting capability for future snapshots (ideally community validators will assist with this process, but we still need to host snapshots we create somewhere)
  • Write logic for pd testnet join --archive-url <url> - feat(pd): support archives for migrate and join #4055
  • Write user-facing documentation for using the flag - docs: chain upgrade procedure, for operators #4097
  • Write developer-facing documentation for storing and updating snapshots.
@github-project-automation github-project-automation bot moved this to 🗄️ Backlog in Penumbra Feb 16, 2024
@hdevalence
Copy link
Member

What's the advantage of doing this rather than providing a .tar.xz of the pd home directory?

@conorsch
Copy link
Contributor Author

As I understand it, that's what the snapshot is: a compressed version of the rocksdb info that pd uses. It must also include a genesis file, which is not included in the pd home directory, but easy enough to overwrite when generating new configs. This ticket is essentially describing the need and the mechanism to "provide a .tar.xz of the pd home directory."

@conorsch conorsch changed the title pd: support state snapshots for joining nodes pd: support state tarball for joining nodes Feb 16, 2024
@hdevalence
Copy link
Member

Got it, I was confused by the term "snapshot" because CometBFT has a notion of p2p snapshot exchange, which we're not currently using.

@conorsch
Copy link
Contributor Author

Thanks, edited for clarity, s/snapshot/archive/ throughout.

@aubrika aubrika added this to the Sprint 2 milestone Mar 6, 2024
@cratelyn cratelyn added the A-upgrades Area: Relates to chain upgrades label Mar 12, 2024
@cratelyn cratelyn modified the milestones: Sprint 2, Sprint 3 Mar 18, 2024
@erwanor erwanor added the _P-high High priority label Mar 18, 2024
@erwanor
Copy link
Member

erwanor commented Mar 18, 2024

Setting this to P-high since this is a requirement to perform a testnet upgrade (both for compaction and migrations) and must be assigned during sprint planning

@cratelyn cratelyn modified the milestones: Sprint 3, Sprint 2 Mar 18, 2024
@conorsch conorsch self-assigned this Mar 18, 2024
@conorsch conorsch moved this from 🗄️ Backlog to In progress in Penumbra Mar 18, 2024
@conorsch
Copy link
Contributor Author

@erwanor I'll grab this one and give it a shot, trying to parallelize the work with what you've already got in flight on the upgrades front.

@conorsch
Copy link
Contributor Author

Resolved via #4055, also #4093.

@github-project-automation github-project-automation bot moved this from In progress to Done in Penumbra Mar 25, 2024
@conorsch
Copy link
Contributor Author

Write user-facing documentation for using the flag.

Ah, still more to go. Working on this today.

@conorsch conorsch reopened this Mar 25, 2024
@github-project-automation github-project-automation bot moved this from Done to In progress in Penumbra Mar 25, 2024
conorsch added a commit that referenced this issue Mar 25, 2024
Adds documentation for the specific steps to be performed by a node
operator in order to participate in a chain upgrade. These docs are
largely based on the existing wiki notes [0], adapted for a generalized
Penumbra setup.

Similarly, sketched out some corresponding changes to the `pd testnet
join` docs, but left them commented out for now: those docs are only
relevant for joining a chain that has already been upgraded. If things
go well with #4087, we'll uncomment those docs and start using them.

Finishes and therefore closes #3841.

[0] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
@aubrika aubrika modified the milestones: Sprint 2, Sprint 3 Mar 25, 2024
conorsch added a commit that referenced this issue Mar 26, 2024
Adds documentation for the specific steps to be performed by a node
operator in order to participate in a chain upgrade. These docs are
largely based on the existing wiki notes [0], adapted for a generalized
Penumbra setup.

Similarly, sketched out some corresponding changes to the `pd testnet
join` docs, but left them commented out for now: those docs are only
relevant for joining a chain that has already been upgraded. If things
go well with #4087, we'll uncomment those docs and start using them.

Finishes and therefore closes #3841.

[0] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
conorsch added a commit that referenced this issue Mar 27, 2024
Adds documentation for the specific steps to be performed by a node
operator in order to participate in a chain upgrade. These docs are
largely based on the existing wiki notes [0], adapted for a generalized
Penumbra setup.

Similarly, sketched out some corresponding changes to the `pd testnet
join` docs, but left them commented out for now: those docs are only
relevant for joining a chain that has already been upgraded. If things
go well with #4087, we'll uncomment those docs and start using them.

Finishes and therefore closes #3841.

[0] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
@github-project-automation github-project-automation bot moved this from In progress to Done in Penumbra Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants