Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Internal Validator - sequencer and stake manager error #2208

Open
BlockgenStudio opened this issue Nov 5, 2024 · 3 comments
Open

Internal Validator - sequencer and stake manager error #2208

BlockgenStudio opened this issue Nov 5, 2024 · 3 comments

Comments

@BlockgenStudio
Copy link

Description:

For the past three weeks, we have been experiencing recurring errors on one of our validators, validator-002, in our production environment. We identified three specific errors in the logs, detailed below:

Error Logs:

  1. Panic Error
    panic: runtime error: slice bounds out of range [:32] with capacity 0

    • Frequency: ~10 occurrences per week

image

  1. Failed to Run Sequence Error
    failed to run sequence - validator manager init: height=17657995 error="getting voting power failed - backend is not initialized for height 17657995, fsm height 17657994"

    • Frequency: ~20 occurrences per week

image

  1. Post Block in Stake Manager Error
    polygon.server.polybft.consensus_runtime: failed to post block in stake manager: err="not found"

    • Frequency: Appears on every block sequence

image

These logs are from validator-002 for the time period from October 28th to November 4th.

How to Reproduce the Issue:
Below are the setup and resource details used to set up our Polygon Supernet, along with relevant environment details:

Infrastructure Setup:

  • Total Nodes: 7 Validators, 3 Non-validators on an internal network
  • Validator Configuration:
    • 5 Validators in a private subnet (genesis validators)
    • 1 Validator in a public subnet
    • 1 External Validator hosted outside the VPC (connected via an RPC from a publicly exposed RPC node to an internal genesis validator)
  • Non-Validator Configuration:
    • 2 Non-validators connected to a load balancer, used as RPC nodes
    • 1 Non-validator connected to a block explorer

Resource Details:

  • Validator Instance Type: c6i.large
  • Non-validator Instance Type (RPC nodes): c6i.xlarge
  • Non-validator Instance Type (Block explorer): c6i.2xlarge
  • Operating System: Ubuntu 22.04 LTS
  • Polygon Edge Version: v1.0.0

Impact and Urgency:
These errors are impacting our production environment. The failed to run sequence error appears multiple times a week and has potential implications for validator stability. Additionally, the post block in stake manager error affects every block sequence, which is a significant operational concern.

Request for Assistance:
Could you provide any guidance on troubleshooting or potential fixes for these issues? If additional logs or specific configurations are needed, please let us know.
Thank you for your assistance!

@R-Santev
Copy link

@BlockgenStudio About point 2: This happens when a block is synced from another validator, and latestHeader.Number is updated just before a new sequence is initiated here:

sequenceCh, stopSequence = p.ibft.runSequence(latestHeader.Number + 1)

This occurs because latestHeader is a pointer. While there may be more elegant solutions to handle this case, I believe the Edge team intentionally left it to throw an error since it avoids adding additional logic (and computation) and works well in practice.

@Stefan-Ethernal, feel free to correct me if I’m wrong.

@BlockgenStudio
Copy link
Author

thanks @R-Santev what about the 1. and 3.?

@BlockgenStudio
Copy link
Author

especially the 3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants