Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local local ledger storage misses blocks on 1.16.17 #33832

Closed
McSim85 opened this issue Oct 24, 2023 · 6 comments
Closed

Local local ledger storage misses blocks on 1.16.17 #33832

McSim85 opened this issue Oct 24, 2023 · 6 comments
Labels
community Community contribution

Comments

@McSim85
Copy link

McSim85 commented Oct 24, 2023

Problem

We've noticed on several occasions that blocks goes missing from RPC.

How to reproduce:

$ curl http://127.0.0.1:8899/ \                                                                                               
    --header 'Content-Type: application/json' \                                                                             
    --data '{"jsonrpc": "2.0", "id": 1, "method": "getBlock", "params": [225337448, {"encoding": "json", "maxSupportedTransactionVersion": 0, "transactionDetails": "none", "rewards": false}]}' 
{"jsonrpc":"2.0","error":{"code":-32007,"message":"Slot 225337448 was skipped, or missing due to ledger jump to recent snapshot"},"id":1}

but if request the public RPC, the block does exist
https://explorer.solana.com/block/225337448

Also, there is a warning in the logs:

/home/solana/log2/solana-validator.log.2:[2023-10-22T18:57:13.210496152Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337347
/home/solana/log2/solana-validator.log.2:[2023-10-22T18:57:13.237666747Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337347
/home/solana/log2/solana-validator.log.2:[2023-10-22T18:57:13.384356247Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337347
...
/home/solana/log2/solana-validator.log.2:[2023-10-22T18:58:03.209122258Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337448
/home/solana/log2/solana-validator.log.2:[2023-10-22T18:58:03.362056385Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337448
/home/solana/log2/solana-validator.log.2:[2023-10-22T18:58:03.519177750Z WARN  solana_rpc::rpc] Bank with Finalized not found at slot: 225337448

Also, getBlocksWithLimit will exclude missed blocks from the results.

curl 127.0.0.1:8899 \
-X POST \
-H "Content-Type: application/json" \
-d '{
    "jsonrpc": "2.0",
    "id":1,
    "method":"getBlocksWithLimit",
    "params":[225331036, 8678]
  }' 

Proposed Solution

A temporary fix would be if the RPC node could reach out to BigTable to fetch the blocks that are missing in case it cannot find the block in the current ledger.

But in the long term, there could be a bug in the validator code.

@McSim85 McSim85 added the community Community contribution label Oct 24, 2023
@McSim85
Copy link
Author

McSim85 commented Oct 24, 2023

rel #17000

@McSim85
Copy link
Author

McSim85 commented Oct 24, 2023

Here is the existing server with the non-cleaned ledger.
Block range in the Ledger

solana@sol-mainnet-full-validator-55:~$ solana-ledger-tool -l ./ledger/ bounds
[2023-10-24T13:17:59.086465951Z INFO  solana_ledger_tool] solana-ledger-tool 1.16.17 (src:667bc163; feat:4033350765, client:SolanaLabs)
[2023-10-24T13:17:59.086549043Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 1048576
[2023-10-24T13:17:59.086554843Z INFO  solana_ledger::blockstore] Opening database at "/home/solana/ledger/rocksdb"
[2023-10-24T13:17:59.089152218Z INFO  solana_ledger::blockstore_db] Opening Rocks with secondary (read only) access at: "/home/solana/ledger/rocksdb/solana-secondary"
[2023-10-24T13:17:59.089161628Z INFO  solana_ledger::blockstore_db] This secondary access could temporarily degrade other accesses, such as by solana-validator
[2023-10-24T13:18:40.120745331Z INFO  solana_ledger::blockstore_db] Rocks's automatic compactions are disabled due to Secondary access
[2023-10-24T13:18:40.121044699Z INFO  solana_ledger::blockstore] "/home/solana/ledger/rocksdb" open took 41.0s
Ledger has data for 547347 slots 225139892 to 225699358
  with 536583 rooted slots from 225139892 to 225699320
  and 36 slots past the last root

[2023-10-24T13:18:41.828054678Z INFO  solana_ledger_tool] ledger tool took 42.7s
solana@sol-mainnet-full-validator-55:~$

Request the block from the local range returns error.

min               225139892
requesting   225536032
max              225699358

request:

solana@sol-mainnet-full-validator-55:~$ curl localhost:8899  --header 'Content-Type: application/json' --data '{"jsonrpc": "2.0", "id": 1, "method": "getBlock", "params": [225536032, {"encoding": "json", "maxSupportedTransactionVersion": 0, "transactionDetails": "none", "rewards": false}]}'
{"jsonrpc":"2.0","error":{"code":-32007,"message":"Slot 225536032 was skipped, or missing due to ledger jump to recent snapshot"},"id":1}
solana@sol-mainnet-full-validator-55:~$

But block does exist:
https://explorer.solana.com/block/225536032

solana@sol-mainnet-full-validator-55::~$ curl https://api.mainnet-beta.solana.com --header 'Content-Type: application/json' --data '{"jsonrpc": "2.0", "id": 1, "method": "getBlock", "params": [225536032, {"encoding": "json", "maxSupportedTransactionVersion": 0, "transactionDetails": "none", "rewards": false}]}'
{"jsonrpc":"2.0","result":{"blockHeight":207574599,"blockTime":1698084242,"blockhash":"B2AggLy2yEHunpMs5BDmV6zBwW6rreuoxDfAoKXg4R1T","parentSlot":225536031,"previousBlockhash":"FU5wbf6w3BLiDCJpAvkzK4nZU6GgUHYspdLSUvWEwgpb"},"id":1}

@McSim85
Copy link
Author

McSim85 commented Oct 24, 2023

ledger-tool output:

$ solana-ledger-tool -l ./ledger/ slot 225536032
[2023-10-24T13:29:30.771443685Z INFO  solana_ledger_tool] solana-ledger-tool 1.16.17 (src:667bc163; feat:4033350765, client:SolanaLabs)
[2023-10-24T13:29:30.771508566Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 1048576
[2023-10-24T13:29:30.771513447Z INFO  solana_ledger::blockstore] Opening database at "/home/solana/ledger/rocksdb"
[2023-10-24T13:29:30.773959707Z INFO  solana_ledger::blockstore_db] Opening Rocks with secondary (read only) access at: "/home/solana/ledger/rocksdb/solana-secondary"
[2023-10-24T13:29:30.773969087Z INFO  solana_ledger::blockstore_db] This secondary access could temporarily degrade other accesses, such as by solana-validator
[2023-10-24T13:29:50.545583452Z INFO  solana_ledger::blockstore_db] Rocks's automatic compactions are disabled due to Secondary access
[2023-10-24T13:29:50.546006804Z INFO  solana_ledger::blockstore] "/home/solana/ledger/rocksdb" open took 19.8s
Slot 225536032
[2023-10-24T13:29:51.298403994Z INFO  solana_ledger_tool] ledger tool took 20.5s

@steviez
Copy link
Contributor

steviez commented Oct 24, 2023

@McSim85 - Just to confirm, was the node that exhibited this error restarted without --no-snapshot-fetch OR was a snapshot manually copied onto this node ?

{"jsonrpc":"2.0","error":{"code":-32007,"message":"Slot 225536032 was skipped, or missing due to ledger jump to recent snapshot"},"id":1}

@McSim85
Copy link
Author

McSim85 commented Oct 24, 2023

Hey @steviez
Thanks for joining troubleshooting.

was the node that exhibited this error restarted without --no-snapshot-fetch.

We don't have --no-snapshot-fetch parameters added. So, the node restarted without --no-snapshot-fetch.

was a snapshot manually copied onto this node ?

We copy snapshots from known validators and from dedicated snapshot nodes before starting the validator process by pre-start script.
But I have all the cases.
I mean I see Bank with Finalized not found at slot: warnings on nodes, that use pre-start snapshot download script and also on nodes, that do not have pre-start script and download snapshots by validator logic.

Also, I see Bank with Finalized not found at slot: errors on Testnet and Devnet nodes that have been upgraded to 1.16.17.

@CriesofCarrots
Copy link
Contributor

the node restarted without --no-snapshot-fetch.

If you restarted without --no-snapshot-fetch, your node almost certainly downloaded a fresh, recent snapshot from one of your known validators, resulting in a gap in your local ledger. Which is indeed the message that your RPC error is reporting. You need to add that flag to restart with your local snapshot to be sure to not miss anything.

Regarding the bounds data: if a gap is within the range of solana-ledger-tool bounds, that command won't really show the gap, except in smaller slot and root counts, which are hard to spot. However, if you were to run getBlocksWithLimit against a public rpc and sum the number of rooted blocks over that range, I think you will find the value larger than that reported by bounds on your node by the size of your gap(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution
Projects
None yet
Development

No branches or pull requests

3 participants