Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain halt during swap execution on post-upgraded devnet #4340

Closed
hdevalence opened this issue May 7, 2024 · 7 comments · Fixed by #4341
Closed

Chain halt during swap execution on post-upgraded devnet #4340

hdevalence opened this issue May 7, 2024 · 7 comments · Fixed by #4341
Assignees
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority
Milestone

Comments

@hdevalence
Copy link
Member

Describe the bug

On a local devnet on which I performed a chain upgrade (this may or may not be relevant), I saw a chain halt while swapping:

2024-05-07T03:02:26.618921Z  INFO abci:EndBlock{height=704}: penumbra_app::server::consensus: ending block height=704
thread 'tokio-runtime-worker' panicked at /Users/hdevalence/code/penumbra/penumbra/crates/core/component/dex/src/component/router/fill_route.rs:502:14:
writing to storage should not fail: could not decode proto from bytes

Caused by:
    failed to decode Protobuf message: invalid tag value: 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /Users/hdevalence/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-abci-0.11.1/src/v037/server.rs:179:75:
called `Result::unwrap()` on an `Err` value: ActorHungUp
@github-project-automation github-project-automation bot moved this to Backlog in Penumbra May 7, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label May 7, 2024
@hdevalence
Copy link
Member Author

2024-05-07T03:17:20.862098Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route: penumbra_dex::component::router::fill_route: completed fill iteration, updating frontier current_input=16546262 current_output=16463633215909090909 input=9983453738 output=16463633215909090909
2024-05-07T03:17:20.862181Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}: penumbra_dex::component::router::fill_route: replacing position replaced_position_id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83
2024-05-07T03:17:20.871205Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution: penumbra_dex::component::position_manager: executing against position position_id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83
2024-05-07T03:17:20.871214Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution: jmt: hashed jmt key key=b"dex/position/plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83" key_hash=KeyHash("497645d2a88d71a13ae19c51c73a0ed6fc8ad45ee95d89c849bda53d401a4ecb")
2024-05-07T03:17:20.871332Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager: updating position state prev_state=Some(Position { state: Opened, reserves: Reserves { r1: 18155385, r2: 26987993 }, phi: TradingFunction { component: BareTradingFunction { fee: 25, p: 1000000, q: 1100000 }, pair: TradingPair { asset_1: passet1r4kcf2m4r92jqmdks5c9yt7v2tgnht4aj3fml4qln56x72nm8qrsm9d598, asset_2: passet1984fctenw8m2fpl8a9wzguzp7j34d7vravryuhft808nyt9fdggqxmanqm } }, nonce: "8bcb508badb058ba74b03ac496a54f10ffa5a4505807df562b9a2a6303ce62fe" }) new_state=Position { state: Opened, reserves: Reserves { r1: 0, r2: 43534255 }, phi: TradingFunction { component: BareTradingFunction { fee: 25, p: 1000000, q: 1100000 }, pair: TradingPair { asset_1: passet1r4kcf2m4r92jqmdks5c9yt7v2tgnht4aj3fml4qln56x72nm8qrsm9d598, asset_2: passet1984fctenw8m2fpl8a9wzguzp7j34d7vravryuhft808nyt9fdggqxmanqm } }, nonce: "8bcb508badb058ba74b03ac496a54f10ffa5a4505807df562b9a2a6303ce62fe" }
2024-05-07T03:17:20.871372Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager: evaluating state transition id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83 prev=Opened new=Opened
2024-05-07T03:17:20.871430Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager::price_index: deindexing position
2024-05-07T03:17:20.871441Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager::price_index: indexing position for 1=>2 trades
2024-05-07T03:17:20.871466Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager::inventory_index: indexing position by inventory
thread 'tokio-runtime-worker' panicked at /Users/hdevalence/code/penumbra/penumbra/crates/core/component/dex/src/component/router/fill_route.rs:502:14:
writing to storage should not fail: could not decode proto from bytes

Caused by:
    failed to decode Protobuf message: invalid tag value: 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@hdevalence
Copy link
Member Author

The bad value is:

2024-05-07T03:29:38.679697Z DEBUG abci:EndBlock{height=704}:end_block{height=704}:dex:handle_batch_swaps:route_and_fill:fill_route:replace_position{index=0}:position_execution:update_position{id=plpid14cwyz400agsagn7cdxygd4vjkyk5prghtp8nc79dptsxg7cd060srcch83}: penumbra_dex::component::position_manager::base_liquidity_index: k/v key_hex="6465782f61622f1d6d84ab751955206db68530522fcc52d13baebd9453bfd41f9d346f2a7b380729ea9c2f3371f6a487e7e95c247041f4a356f983eb064e5d2b3bcf322ca96a10" value_hex="0000000000000000000000003e1d3f3b"
thread 'tokio-runtime-worker' panicked at /Users/hdevalence/code/penumbra/penumbra/crates/core/component/dex/src/component/router/fill_route.rs:502:14:
writing to storage should not fail: could not decode proto from bytes

Caused by:
    failed to decode Protobuf message: invalid tag value: 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@hdevalence
Copy link
Member Author

That hex-encoded key is prefixed with dex/ab.

As of 0.73.1, that key is used here: https://github.com/penumbra-zone/penumbra/blob/v0.73.1/crates/core/component/dex/src/state_key.rs#L114-L123

As of main, I don't see that key in use in the migration: https://github.com/penumbra-zone/penumbra/blob/main/crates/bin/pd/src/migrate/testnet74.rs#L46-L77

@hdevalence
Copy link
Member Author

Steps to reproduce:

  1. Check out v0.73.1.
  2. Run cargo run --release --bin pd -- testnet generate --proposal-voting-blocks 60 --epoch-duration 50.
  3. Run cargo run --release --bin pd -- start and cometbft start --home ... to launch an 0.73 network.
  4. Delegate to the validator so you have governance power (do this first to wait for the epoch boundary):
cargo run --release --bin pcli -- q validator list
cargo run --release --bin pcli -- tx delegate 100000penumbra --to penumbravalid1...
  1. Create some liquidity positions:
cargo run --release --bin pcli -- tx lp replicate xyk penumbra:gn 1000penumbra -f 25 -c 1.3
cargo run --release --bin pcli -- tx lp replicate xyk test_usd:gm 1000test_usd -f 25 -c 1.1
cargo run --release --bin pcli -- tx lp order sell 2000gm@1gn/50bps
cargo run --release --bin pcli -- tx lp order sell 2000gn@1gm/50bps
cargo run --release --bin pcli -- tx lp replicate xyk test_usd:gn 1000test_usd -f 25 -c 1.1
cargo run --release --bin pcli -- tx lp replicate xyk penumbra:gm 1000penumbra -f 25 -c 1.0
cargo run --release --bin pcli -- tx lp replicate xyk penumbra:test_usd 1000penumbra -f 25 -c 2.0
cargo run --release --bin pcli -- tx lp replicate xyk penumbra:gn 1000penumbra -f 25 -c 1.0
  1. Create a governance proposal to halt the chain in 100 blocks from whatever the current height is:
cargo run --release --bin pcli -- tx proposal template --file tmp-upgrade-proposal upgrade-plan
vim tmp-upgrade-proposal # edit to set upgrade height
cargo run --release --bin pcli -- tx proposal submit --file tmp-upgrade-proposal --deposit-amount 10000000
cargo run --release --bin pcli -- tx vote yes --on 0
  1. Once the chain halts, check out fix(dex): migration for dex trait proto encodings #4341 (which includes the fixes that make upgrades easy), then run the upgrade
cargo run --release --bin pd -- migrate
  1. Restart pd and cometbft, then make a swap transaction.

@hdevalence hdevalence moved this from Backlog to In progress in Penumbra May 7, 2024
@hdevalence hdevalence added the _P-high High priority label May 7, 2024
@erwanor
Copy link
Member

erwanor commented May 7, 2024

I am fairly sure that there are no issue with the migration, but that the bug was introduced from my PR #4188 where I replaced:

       // Write the new lookup index storing `new_a_from_b` for this trading pair.
         self.nonverifiable_put_raw(
             engine::routable_assets::a_from_b(&pair).to_vec(),
             new_a_from_b.to_be_bytes().to_vec(),
         );

With a proto schema wrapper:

         let auxiliary_key = engine::routable_assets::lookup_base_liquidity_by_pair(&pair).to_vec();
         self.nonverifiable_put(auxiliary_key, new_tally);
         tracing::debug!(
             ?pair,
             "base liquidity heuristic marked directed pair as routable"
         );

This was an oversight on my part, I double checked that the main index was proto encoded, but must have missed that the other one wasn't.

@erwanor
Copy link
Member

erwanor commented May 7, 2024

@zbuc correct me if i'm wrong, but the reason why the state key dex/ab is not part of the migration is that it is supposed to be a lookup index that stores the aggregate base liquidity (or routable liquidity) and we compute the BE complement of that value in the state key accessor.

@zbuc
Copy link
Member

zbuc commented May 7, 2024

@erwanor the reason the dex/ab keys weren't touched is because they store aggregate liquidity per trading pair (the keys encode a trading pair and are not sortable -- the stored value is the aggregate liquidity).

The bug I was fixing related to the dex/ra keys which store routable liquidity from a specific starting asset, and are sortable because the amount of A->B liquidity is the last part of the key, and B is the stored value. The issue was that the ordering of routable liquidity was reversed from what we wanted.

conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
@aubrika aubrika added this to the Sprint 6 milestone May 7, 2024
@aubrika aubrika removed the needs-refinement unclear, incomplete, or stub issue that needs work label May 7, 2024
conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
@conorsch conorsch added the A-upgrades Area: Relates to chain upgrades label May 7, 2024
conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
conorsch added a commit that referenced this issue May 7, 2024
Building on the smoke-test rewrite to use process-compose,
let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version,
and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded
artifacts on Github. That's fine for `pd`, but doesn't work for
running the smoke tests, due to client/server incompatibility.
Therefore we'll clone the entire repo in a git-ignored subdir,
and build the old binaries there. Heavy, but reliable.

Updated to use the concise `pd migrate` UX from #4339.
Previously, there were missing AuctionParams, resolved by #4338.
Still seeing some proto incompat post-migration, via the test runs,
which appears to match the report in #4340.
@github-project-automation github-project-automation bot moved this from In progress to Done in Penumbra May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-upgrades Area: Relates to chain upgrades _P-high High priority
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants