Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

节点误判进入sync状态,导致出块延迟 #244

Open
rink1969 opened this issue Jul 12, 2023 · 0 comments
Open

节点误判进入sync状态,导致出块延迟 #244

rink1969 opened this issue Jul 12, 2023 · 0 comments
Assignees

Comments

@rink1969
Copy link
Member

raft链进行update validators操作时:先只保留1个共识节点(node0),然后再重新设置回3个共识节点
前一个操作成功完成,打包在160高度
后一个操作打包在161高度,但是161高度卡了好长才出块

node0没及时出161,是因为共识get proposal failed: NodeInSyncMode

相关日志:

2023-07-12 04:11:37.745	
2023-07-11T20:11:37.745758701Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:34.744	
2023-07-11T20:11:34.744455185Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:31.742	
2023-07-11T20:11:31.742036647Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:29.503	
2023-07-11T20:11:29.50307139Z  WARN get_transaction_index: controller::grpc_client::storage: load tx height failed: NotFound. hash: 0x58d96947f26d763a884f705e5bc92073d9bd218fa0cb279a89b9644d3a7b5a4f
2023-07-12 04:11:29.502	
2023-07-11T20:11:29.50273526Z  WARN get_transaction_block_number: controller::grpc_client::storage: load tx height failed: NotFound. hash: 0x58d96947f26d763a884f705e5bc92073d9bd218fa0cb279a89b9644d3a7b5a4f
2023-07-12 04:11:29.501	
2023-07-11T20:11:29.501514511Z  WARN get_transaction: controller::grpc_client::storage: db get tx failed: NotFound. hash: 0x58d96947f26d763a884f705e5bc92073d9bd218fa0cb279a89b9644d3a7b5a4f
2023-07-12 04:11:28.739	
2023-07-11T20:11:28.739614803Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:25.737	
2023-07-11T20:11:25.737564336Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:22.735	
2023-07-11T20:11:22.735529696Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:19.733	
2023-07-11T20:11:19.731859981Z  INFO cloud_util::metrics: register histogram: consensus_to_GetProposal success
2023-07-12 04:11:19.733	
2023-07-11T20:11:19.731727069Z  WARN get_proposal: controller::grpc_server::consensus_server: rpc get proposal failed: NodeInSyncMode
2023-07-12 04:11:19.728	
2023-07-11T20:11:19.72797781Z  INFO process_network_msg: controller::node_manager: update node status: origin: 2153c0d48f1daa83, height: 160, hash: 0x85a2f765f4ca7a5936a7ad0e32cd67eff0569e3ae1890ac85b8c5d5620a99671
2023-07-12 04:11:19.723	
2023-07-11T20:11:19.72324423Z  INFO commit_block:chain_commit_block:commit_block:finalize_block: controller::chain: finalize block(160) success: pool len: 0, pool quota: 0. hash: 0x85a2f765f4ca7a5936a7ad0e32cd67eff0569e3ae1890ac85b8c5d5620a99671
2023-07-12 04:11:19.723	
2023-07-11T20:11:19.723225826Z  INFO commit_block:chain_commit_block:commit_block:finalize_block: controller::chain: update auth and pool, tx_hash_list len 1

查看其他节点的日志,所有节点commit 160区块的时间差别非常小,不存在别的节点高度比较高导致node0进入同步状态的情况。

经过分析代码,可能的原因是:
因为node0刚finalize block(160),还没有更新自己的status等状态(own status还在159高度)
就收到了update node status: origin: 2153c0d48f1daa83, height: 160,
此时误认为远端节点比本地节点高度高,因此进入sync step

最后是超时触发 inner healthy check 才解除了这个状态
2023-07-11T20:12:40.307465089Z INFO controller: inner healthy check: broadcast csi: height: 160, 1th time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants