Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use S3 API to access snapshot #339

Closed
voron opened this issue Feb 12, 2024 · 1 comment
Closed

Use S3 API to access snapshot #339

voron opened this issue Feb 12, 2024 · 1 comment

Comments

@voron
Copy link

voron commented Feb 12, 2024

It's a feature request (kinda addition to #260) to make bootstrap from snapshot a lot easier for users. The idea is the following

  • use S3 APi instead of HTTP
    • Cloudflare R2 allows to create read-only user
  • access unarchived datadir content ( geth dir content basically) instead of single archive file

Pros:

  • no archive - no double-space requirement
    • wget | tar is non-starter with 2.4TB archive, any reconnect and you have to start from scratch
    • matters with bare metal servers, it's a bit tricky to get double space for single use task
  • use of s3-optimized tools to boost performance like s5cmd
    • aria2c is good, but it requires single file to proceed
    • s5cmd may be used to boost upload performance, with or without multipart uploads
    • on-the-fly checksum verification to ensure integrity
  • no archive - incremental sync-up is possible, download changed objects only, not the whole datadir
    • a quick way for node ops to catch up a dated node or continue the download using a fresh snapshot source
    • it's tricky to do the same with uploads, as well-known/exposed directory has to be in consistent state at any time, thus no benefits here

Cons:

  • increased billing
    • One S3 sync estimate is 1 class A op + 0.1M class B op with PBSS datadir (~50k files), making every full sync like $0.036 after free teer.
    • R2 data store increase
      • snapshot compression ratio is low, it's like 200GB per snapshot, ~$3/month
  • expose access key and secret key to public
    • it's read-only though
    • it may be rotated once in a couple months in case of abuse

PS: I'm not talking about hash-based schema with 500k+ files, it's going to be deprecated anyway. Testnet snapshot may be small enough to make wget|tar to work in most cases also.

@zzzckck
Copy link
Collaborator

zzzckck commented Feb 21, 2024

Thanks for you feedback, we may not use the S3 API tool, as:
1.Cost increase as you mentioned. There would have lots of files to be upload/download, cost could be much higher then one single large file.
2.Performance may not good, although "s3-optimized tools" could have good performance

Maybe we can provide a tool to improve UX, like "double-space issue"

@zzzckck zzzckck closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants