Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the error handling next to postdata_metadata.json #182

Closed
pigmej opened this issue Jul 17, 2023 · 7 comments
Closed

Check the error handling next to postdata_metadata.json #182

pigmej opened this issue Jul 17, 2023 · 7 comments
Assignees
Labels
area/post bug Something isn't working

Comments

@pigmej
Copy link
Member

pigmej commented Jul 17, 2023

Rationale

There are claims in the community that after initialization postdata_metadata.json was corrupted or empty.
Some say that it happened because of running out of disk space, but some Users do not have any problems with disk space. So it seems that in case PoS initialization fails for some reason — then it breaks everything.

That means:

  • user lost nonce found during initialization
  • he needs to find new nonce which is "not that easy" case.
  • User cannot start the Node (it crashes)

We need to

  1. Handle properly failures during PoS initialization: do not leave files in an inconsistent state if possible.
  2. Since there still might be I/O problems — Node should handle the case correctly.
    So if postdata_metadata.json is corrupted/empty, while there is no postdata_N.bin — it can just recreate everything.
    If there are some post data already generate — then it is a more complicated case and most likely we need the User's attention to decide what to do with it.
    For example, let's say we cannot write valid JSON or remove an inconsistent file because User unplugged his external hard drive — then the Node should not crash if no PoS data generated yet. Just recreated everything.
@pigmej pigmej changed the title Chcek the error handling next to postdata_metadata.json Check the error handling next to postdata_metadata.json Jul 17, 2023
@brusherru brusherru added the bug Something isn't working label Jul 18, 2023
@brusherru
Copy link
Member

Go through one more case reported in Discord and updated the issue.

@fasmat fasmat self-assigned this Jul 19, 2023
@fasmat fasmat moved this from 📋 Backlog to 🔖 Next in Dev team kanban Jul 19, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Dev team kanban Jul 19, 2023
@fasmat fasmat moved this from 🔖 Next to 🏗 Doing in Dev team kanban Aug 3, 2023
@fasmat
Copy link
Member

fasmat commented Aug 17, 2023

  • user lost nonce found during initialization
  • he needs to find new nonce which is "not that easy" case.

Both of these have been addressed already with

A corrupted postdata_metadata.json file will be prevented in the future by making updates to the file atomic (part of #211).

@poszu
Copy link
Collaborator

poszu commented Sep 1, 2023

@fasmat, perhaps changing the code to do atomic updates could be extracted from #211 into a separate PR as this is a trivial change and there is no point in holding it back by other unrelated changes.

@lrettig
Copy link
Member

lrettig commented Sep 8, 2023

This happened to me once; for the record, #193 is not a satisfactory workaround because it takes a very, very, very, very long time for large data sizes.

@fasmat
Copy link
Member

fasmat commented Sep 8, 2023

@lrettig we just recently merged #231 which will land in the node with the next version. This will prevent postdata_metadata.json from being deleted / corrupted if the node crashes at the wrong moment.

If however it is already missing, I don't see a better / faster way to regenerate the file. #193 should already be significantly faster than a re-init because it only needs to do one pass over the data to find the nonce again. This will take at most as long as generating a proof (so at most 12 hours if your node is set up to be able to generate a proof within the cycle gap).

@poszu
Copy link
Collaborator

poszu commented Sep 13, 2023

@lrettig, it shouldn't take that long to find the lost VRF nonce. It's basically limited by disk read speed only.

@fasmat fasmat moved this from 🏗 Doing to On Hold in Dev team kanban Sep 19, 2023
@fasmat
Copy link
Member

fasmat commented Sep 19, 2023

With the atomic update of the postdata_metadata.json file now being integrated in v1.1.6 of the node I will close this issue.

@fasmat fasmat closed this as completed Sep 19, 2023
@dshulyak dshulyak moved this from On Hold to ✅ Done in Dev team kanban Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/post bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

5 participants