Script finishes with no errors at 568GB.

Question

Script finishes with no errors at 568GB.

Breisoft opened this issue a year ago · comments

The script finishes and launches with no errors at 568GB, I noticed another person opened up a similar issue, but you said changes had been made since. I'm using the most up to date version of the repo, and am having the same issue as far as I can tell. When I re-run the bash script, it starts deleting the files. Thank you!

Nathan (Blaise) Bruer · Answer 1 · Thu Dec 29 2022 09:05:08 GMT+0800 (China Standard Time)

Yeah, I have seen this too, it appears to be related to the way multi-process download works. I'll look into it when I have the time.

Breisoft · Answer 2 · Thu Dec 29 2022 11:03:57 GMT+0800 (China Standard Time)

Thanks allada. I tried again with half as many processes and still ran into the same issue, if that helps. I'd try to fix it myself and make a pull request, but I don't know shell.

Nathan (Blaise) Bruer · Answer 3 · Thu Dec 29 2022 11:29:26 GMT+0800 (China Standard Time)

You can try on a larger instance with more cores, it may help... The issue is likely it being too aggressive in parallel downloads.

You can also download the mdbx file manually.

Breisoft · Answer 4 · Thu Dec 29 2022 11:32:51 GMT+0800 (China Standard Time)

I was using im4gn.4xlarge, but I can try a larger one. Mdbx file? is that in the s3 bucket?

Nathan (Blaise) Bruer · Answer 5 · Thu Dec 29 2022 11:39:22 GMT+0800 (China Standard Time)

Yes, you should be able to make your own server quite easily, just install the needed software and run:

# Downloads the mdbx file.
aws s3 cp --request-payer=requester s3://public-blockchain-snapshots/bsc/erigon/archive/latest/v1/chaindata/mdbx.dat.zstd - \
    | pv \
    | zstd -q -d -o /erigon/data/bsc/chaindata/mdbx.dat

# Download the snapshot files.
aws s3 sync --request-payer=requester s3://public-blockchain-snapshots/bsc/erigon/archive/latest/v1/snapshots/ /erigon/data/bsc/snapshots/

Then start erigon with:
erigon --chain bsc --snapshots=true --db.pagesize=16k --datadir=/erigon/data/bsc --txpool.disable

You'll need to setup the disk drives and install everything manually though.

Breisoft · Answer 6 · Fri Dec 30 2022 08:16:07 GMT+0800 (China Standard Time)

Do those commands give the full archive history of BSC? I only see data going back 2 months

Nathan (Blaise) Bruer · Answer 7 · Fri Dec 30 2022 09:22:05 GMT+0800 (China Standard Time)

Yes they should

Breisoft · Answer 8 · Sat Dec 31 2022 02:45:32 GMT+0800 (China Standard Time)

I believe I've discovered the issue with the script. Everything appears to be working fine now, here's a detailed explanation of the problem:

I ssh in directly after creating an ec2 instance and clone this repository and then run the script directly
The script installs all of the required dependencies, but fails before it starts downloading because AWS configure hasn't been run yet due to being a fresh instance
Although the script fails before downloading starts, it still creates the zfs file system before it fails.
The snapshot folder appeared to download fine even before, hence me getting 568GB and then exiting successfully. That's because the 568GB number is the proper size of the snapshot folder.
The chaindata file did NOT download properly, I believe due to the download_database_file() function

This function, unlike the download functions for snapshots, nodes, parlia, etc. returns if the chaindata folder already exists. Since the script created the folder already before download_database_file was executed, this led to the download never starting.

Hope this helps! I would write a PR myself, but I'm not familiar with bash or zfs commands. Appreciate your help as well.

Nathan (Blaise) Bruer · Answer 9 · Sun Jan 15 2023 06:05:09 GMT+0800 (China Standard Time)

This should be done now in the latest code.