bnb-chain / bsc-snapshots

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use S3 API to access snapshot

voron opened this issue · comments

It's a feature request (kinda addition to #260) to make bootstrap from snapshot a lot easier for users. The idea is the following

  • use S3 APi instead of HTTP
    • Cloudflare R2 allows to create read-only user
  • access unarchived datadir content ( geth dir content basically) instead of single archive file

Pros:

  • no archive - no double-space requirement
    • wget | tar is non-starter with 2.4TB archive, any reconnect and you have to start from scratch
    • matters with bare metal servers, it's a bit tricky to get double space for single use task
  • use of s3-optimized tools to boost performance like s5cmd
    • aria2c is good, but it requires single file to proceed
    • s5cmd may be used to boost upload performance, with or without multipart uploads
    • on-the-fly checksum verification to ensure integrity
  • no archive - incremental sync-up is possible, download changed objects only, not the whole datadir
    • a quick way for node ops to catch up a dated node or continue the download using a fresh snapshot source
    • it's tricky to do the same with uploads, as well-known/exposed directory has to be in consistent state at any time, thus no benefits here

Cons:

  • increased billing
    • One S3 sync estimate is 1 class A op + 0.1M class B op with PBSS datadir (~50k files), making every full sync like $0.036 after free teer.
    • R2 data store increase
      • snapshot compression ratio is low, it's like 200GB per snapshot, ~$3/month
  • expose access key and secret key to public
    • it's read-only though
    • it may be rotated once in a couple months in case of abuse

PS: I'm not talking about hash-based schema with 500k+ files, it's going to be deprecated anyway. Testnet snapshot may be small enough to make wget|tar to work in most cases also.

Thanks for you feedback, we may not use the S3 API tool, as:
1.Cost increase as you mentioned. There would have lots of files to be upload/download, cost could be much higher then one single large file.
2.Performance may not good, although "s3-optimized tools" could have good performance

Maybe we can provide a tool to improve UX, like "double-space issue"