Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't delete redundant postdata files #239

Closed
wants to merge 6 commits into from
Closed

Don't delete redundant postdata files #239

wants to merge 6 commits into from

Conversation

lrettig
Copy link
Member

@lrettig lrettig commented Sep 15, 2023

Closes #236

Don't remove "redundant" files. It's too easy to accidentally remove already-initialized postdata files, and I don't see any value in doing so.

Remove the debug output for unrecognized files too (spacemeshos/go-spacemesh#4789).

@lrettig lrettig requested review from poszu and fasmat September 15, 2023 18:07
@PengIoT
Copy link

PengIoT commented Sep 17, 2023

I lost a terabyte of data, and it took a graphics card many days to generate it. I put read-only attributes on the data, and it was still deleted, how much do the developers hate the data?

@fasmat
Copy link
Member

fasmat commented Sep 18, 2023

With that change we should also update the Initializer::verifyMetadata method to return an error if NumUnits doesn't match the expected value (instead of only if it is larger than the expected value), update tests accordingly and probably add a test case for the changed behavior.

Otherwise larger PoSTs than expected aren't handled correctly, i.e. the node thinks that it has x numUnits but generates proofs for x+n numUnits that will fail to verify after generating.

@fasmat
Copy link
Member

fasmat commented Sep 18, 2023

TestInitialize_RedundantFiles also needs to be updated: you can just add a t.Skip similar to TestInitialize_NumUnits_Increase with the message that resizing PoST isn't supported at the moment.

Copy link
Member

@fasmat fasmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comments

@lrettig
Copy link
Member Author

lrettig commented Sep 19, 2023

@fasmat could you please explain this part?

the node thinks that it has x numUnits but generates proofs for x+n numUnits that will fail to verify after generating.

@fasmat
Copy link
Member

fasmat commented Sep 19, 2023

Imagine the following situation:

  1. The node calls proving.Generate on a directory containing 16 space units (1 TB) of data.
  2. After ~ 3 hours the proof is finished generating
  3. The node configured for 10 space units verifies the proof.
  4. Proof verification fails, the node waits 5 minutes and starts generating a new proof (that will again fail to verify).

Before such a setup would have silently deleted PoST data, now it will result in an infinite loop.

Preventing the node from deleting data solves one issue but introduces others. Right now the change in this PR also makes it impossible for a user to willingly reduce their PoST size, a use case that is currently covered from end to end:

  1. user changes their post size (via postcli or node config)
  2. smaller post size will cause the node to publish an ATX in the next epoch with the new correct size.

Ensure that numunits matches config
Allow less files than actually found on disk (in test)
@lrettig
Copy link
Member Author

lrettig commented Sep 19, 2023

Thanks, that helps.

The node calls proving.Generate on a directory containing 16 space units (1 TB) of data.

Why isn't the node passing the numunits into proving.Generate? And/or why isn't numunits being read from the metadata? In other words, the number of files on disk should never be taken as an authoritative source of information.

Proof verification fails

Why would a proof of a greater number of units fail a check for a lesser numunits? Can't we say that a proof of numunits >= n is always valid for numunits=n?

this PR also makes it impossible for a user to willingly reduce their PoST size

In this case the user is responsible for manually deleting the extra files. We should print a warning if we find them, but otherwise everything should work fine.

@fasmat
Copy link
Member

fasmat commented Sep 19, 2023

And/or why isn't numunits being read from the metadata?

It is, proof generation reads the value from postdata_metadata.json. The node however reads it from the node configuration. Afaik these values are only checked to match in the initializer and the node config atm has precedence (as do parameters passed to postcli).

In other words, the number of files on disk should never be taken as an authoritative source of information.

That's why if they don't match we delete files 🙂, the authoritative source is the node config or the parameters passed to postcli. If a wrong value is passed files will be deleted.

Can't we say that a proof of numunits >= n is always valid for numunits=n?

We could, I can't think of a reason why this shouldn't be allowed - I would need to think on that. However such a proof isn't considered valid at the moment, so additional changes are necessary.

In this case the user is responsible for manually deleting the extra files.

Then this should be made explicit and documented in the README.md and manual. Right now if the user is manually deleting files they will just be regenerated, also after this PR is merged.

@fasmat
Copy link
Member

fasmat commented Sep 19, 2023

Regarding allowing proofs for bigger PoST sizes: if we allow that we essentially have to finish implementing this epic: spacemeshos/pm#209

If we don't it would be unfair that users only receive rewards for x Space Units if they were successfully able to publish a proof for x+n Space Units.

@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #239 (2517665) into develop (6fc593a) will increase coverage by 0.3%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           develop    #239     +/-   ##
=========================================
+ Coverage     68.6%   69.0%   +0.3%     
=========================================
  Files           28      28             
  Lines         1862    1840     -22     
=========================================
- Hits          1279    1270      -9     
+ Misses         435     426      -9     
+ Partials       148     144      -4     
Files Changed Coverage Δ
initialization/initialization.go 77.7% <100.0%> (+1.6%) ⬆️

if more than the expected number of units is found
@lrettig
Copy link
Member Author

lrettig commented Sep 19, 2023

@fasmat I made the requested changes and updated the tests, please take another look.

If we don't it would be unfair that users only receive rewards for x Space Units if they were successfully able to publish a proof for x+n Space Units.

I think it's fine if a proof of x+n SU gets created and the user gets credit for it, even if NumUnits says something smaller. The main issue that this PR is trying to address is that the system should never automatically delete files regardless of what the metadata says.

@fasmat
Copy link
Member

fasmat commented Sep 19, 2023

I think it's fine if a proof of x+n SU gets created and the user gets credit for it

The user only gets credit for x SUs then but not x+n.

I made the requested changes and updated the tests, please take another look.

Could you add a test that initialises with x SUs, then initialises again with x-n space units, then generates a proof and then tries to verify the proof with x-n space units? I would assume such a test would currently fail in post-rs

@fasmat
Copy link
Member

fasmat commented Feb 27, 2024

With #270 users now have to confirm if the provide flags to postcli that will cause the tool to delete existing post data. Additionally I'm currently working on removing initialization from the node itself -smapp and CLI users both will have to use postcli in the future.

@fasmat fasmat closed this Feb 27, 2024
@fasmat fasmat deleted the nodelete branch February 27, 2024 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

postcli should never delete existing postdata files
3 participants