Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warp sync crash (final_outcome.encoded_size() <= MAX_WARP_SYNC_PROOF_SIZE) #6957

Closed
2 tasks done
RomarQ opened this issue Dec 18, 2024 · 3 comments · Fixed by #6963
Closed
2 tasks done

Warp sync crash (final_outcome.encoded_size() <= MAX_WARP_SYNC_PROOF_SIZE) #6957

RomarQ opened this issue Dec 18, 2024 · 3 comments · Fixed by #6963
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@RomarQ
Copy link
Contributor

RomarQ commented Dec 18, 2024

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

A user reported the following crash on the moonbeam repository, it happened when using warp sync and the service was running for a month before crashing.

let final_outcome = WarpSyncProof { proofs, is_finished };
debug_assert!(final_outcome.encoded_size() <= MAX_WARP_SYNC_PROOF_SIZE);
Ok(final_outcome)

Important to note that the panic didn't happen again since the user restarted the node.


Dec 09 22:31:48  moonbeam[1810]: 2024-12-09 22:31:48 [🌗] 🔮 Skipping candidate production because we are not eligible for slot 23775533
Dec 09 22:31:48  moonbeam[1810]: 2024-12-09 22:31:48 [🌗] Pinned block cache limit reached. Evicting value. hash = 0x8c73…9940
Dec 09 22:31:48  moonbeam[1810]: 2024-12-09 22:31:48 [🌗] 🆕 Imported #8684548 (0xd692…d374 → 0xf99e…cfd0)
Dec 09 22:31:49  moonbeam[1810]: ====================
Dec 09 22:31:49  moonbeam[1810]: Version: 0.41.1-eb89f949d5a
Dec 09 22:31:49  moonbeam[1810]:    0: sp_panic_handler::set::{{closure}}
Dec 09 22:31:49  moonbeam[1810]:    1: std::panicking::rust_panic_with_hook
Dec 09 22:31:49  moonbeam[1810]:    2: std::panicking::begin_panic_handler::{{closure}}
Dec 09 22:31:49  moonbeam[1810]:    3: std::sys_common::backtrace::__rust_end_short_backtrace
Dec 09 22:31:49  moonbeam[1810]:    4: rust_begin_unwind
Dec 09 22:31:49  moonbeam[1810]:    5: core::panicking::panic_fmt
Dec 09 22:31:49  moonbeam[1810]:    6: core::panicking::panic
Dec 09 22:31:49  moonbeam[1810]:    7: <sc_consensus_grandpa::warp_proof::NetworkProvider<Block,Backend> as sc_network_sync::strategy::warp::WarpSyncProvider<Block>>::generate
Dec 09 22:31:49  moonbeam[1810]:    8: sc_network_sync::warp_request_handler::RequestHandler<TBlock>::run::{{closure}}
Dec 09 22:31:49  moonbeam[1810]:    9: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
Dec 09 22:31:49  moonbeam[1810]:   10: tokio::runtime::task::raw::poll
Dec 09 22:31:49  moonbeam[1810]:   11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
Dec 09 22:31:49  moonbeam[1810]:   12: tokio::runtime::scheduler::multi_thread::worker::run
Dec 09 22:31:49  moonbeam[1810]:   13: tokio::runtime::task::raw::poll
Dec 09 22:31:49  moonbeam[1810]:   14: std::sys_common::backtrace::__rust_begin_short_backtrace
Dec 09 22:31:49  moonbeam[1810]:   15: core::ops::function::FnOnce::call_once{{vtable.shim}}
Dec 09 22:31:49  moonbeam[1810]:   16: std::sys::pal::unix::thread::Thread::new::thread_start
Dec 09 22:31:49  moonbeam[1810]:   17: <unknown>
Dec 09 22:31:49  moonbeam[1810]:   18: <unknown>
Dec 09 22:31:49  moonbeam[1810]: Thread 'tokio-runtime-worker' panicked at 'assertion failed: final_outcome.encoded_size() <= MAX_WARP_SYNC_PROOF_SIZE', /root/.cargo/git/checkouts/polkadot-sdk-38b703c7469a7d1e/86b704d/substrate/client/consensus/grandpa/src/warp_proof.rs:184
...
Dec 09 22:31:53  systemd[1]: moonbeam.service: Main process exited, code=exited, status=1/FAILURE
Dec 09 22:31:53  systemd[1]: moonbeam.service: Failed with result 'exit-code'.
Dec 09 22:31:53  systemd[1]: moonbeam.service: Consumed 1month 4d 4h 24min 58.389s CPU time.
Dec 09 22:31:58  systemd[1]: moonbeam.service: Scheduled restart job, restart counter is at 1.
Dec 09 22:31:58  systemd[1]: Stopped moonbeam service.
Dec 09 22:31:58  systemd[1]: moonbeam.service: Consumed 1month 4d 4h 24min 58.389s CPU time.

Steps to reproduce

Does not seem easy to reproduce.

@RomarQ RomarQ added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Dec 18, 2024
@skunert skunert added this to SDK Node Dec 18, 2024
@github-project-automation github-project-automation bot moved this to backlog in SDK Node Dec 18, 2024
@bkchr
Copy link
Member

bkchr commented Dec 18, 2024

@RomarQ ty for reporting, why is the operator running a debug build? :D

@RomarQ
Copy link
Contributor Author

RomarQ commented Dec 18, 2024

Thanks for having a look at it.

We currently have debug-assertions enabled in the production profile. I was going to disable it, but then noticed that build-linux-stable workflow in polkadot-sdk also enables it and decided to leave it enabled.

Will probably disable it for production.

RUSTFLAGS: "-Cdebug-assertions=y -Dwarnings"

@bkchr
Copy link
Member

bkchr commented Dec 19, 2024

# Enable debug assertions since we are running optimized builds for testing

These images are only used for testing and not for production.

github-merge-queue bot pushed a commit that referenced this issue Dec 20, 2024
There was the chance that a `WarpProof` was bigger than the maximum warp
sync proof size. This could have happened when inserting the last
justification, which then may pushed the total proof size above the
maximum. The solution is simply to ensure that the last justfication
also fits into the limits.

Close: #6957

---------

Co-authored-by: command-bot <>
@github-project-automation github-project-automation bot moved this from backlog to done in SDK Node Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Status: done
Development

Successfully merging a pull request may close this issue.

2 participants