allada / bsc-archive-snapshot

Free public Binance Smart Chain (BSC) Archive Snapshot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Script finishes with no errors at 568GB.

Breisoft opened this issue · comments

The script finishes and launches with no errors at 568GB, I noticed another person opened up a similar issue, but you said changes had been made since. I'm using the most up to date version of the repo, and am having the same issue as far as I can tell. When I re-run the bash script, it starts deleting the files. Thank you!

Yeah, I have seen this too, it appears to be related to the way multi-process download works. I'll look into it when I have the time.

Thanks allada. I tried again with half as many processes and still ran into the same issue, if that helps. I'd try to fix it myself and make a pull request, but I don't know shell.

You can try on a larger instance with more cores, it may help... The issue is likely it being too aggressive in parallel downloads.

You can also download the mdbx file manually.

I was using im4gn.4xlarge, but I can try a larger one. Mdbx file? is that in the s3 bucket?

Yes, you should be able to make your own server quite easily, just install the needed software and run:

# Downloads the mdbx file.
aws s3 cp --request-payer=requester s3://public-blockchain-snapshots/bsc/erigon/archive/latest/v1/chaindata/mdbx.dat.zstd - \
    | pv \
    | zstd -q -d -o /erigon/data/bsc/chaindata/mdbx.dat

# Download the snapshot files.
aws s3 sync --request-payer=requester s3://public-blockchain-snapshots/bsc/erigon/archive/latest/v1/snapshots/ /erigon/data/bsc/snapshots/

Then start erigon with:
erigon --chain bsc --snapshots=true --db.pagesize=16k --datadir=/erigon/data/bsc --txpool.disable

You'll need to setup the disk drives and install everything manually though.

Do those commands give the full archive history of BSC? I only see data going back 2 months

Yes they should

I believe I've discovered the issue with the script. Everything appears to be working fine now, here's a detailed explanation of the problem:

  1. I ssh in directly after creating an ec2 instance and clone this repository and then run the script directly
  2. The script installs all of the required dependencies, but fails before it starts downloading because AWS configure hasn't been run yet due to being a fresh instance
  3. Although the script fails before downloading starts, it still creates the zfs file system before it fails.
  4. The snapshot folder appeared to download fine even before, hence me getting 568GB and then exiting successfully. That's because the 568GB number is the proper size of the snapshot folder.
  5. The chaindata file did NOT download properly, I believe due to the download_database_file() function

This function, unlike the download functions for snapshots, nodes, parlia, etc. returns if the chaindata folder already exists. Since the script created the folder already before download_database_file was executed, this led to the download never starting.

Hope this helps! I would write a PR myself, but I'm not familiar with bash or zfs commands. Appreciate your help as well.

This should be done now in the latest code.