Cant unwind headers - stuck on same block

Question

Cant unwind headers - stuck on same block

mariel-sendblocks opened this issue 8 months ago · comments

mariel-sendblocks commented 8 months ago

It seems like my erigon bsc node is corrupted because of some restart that was done brutally on my machine.
i tried to unwind all stages but the headers are stuck on the same block.

print_stages shows the following :

d316f7be99be:/go/bsc-erigon/build/bin$ ./integration print_stages --datadir /var/lib/erigon/ --chain bsc
INFO[01-23|19:05:35.790] logging to file system                   log dir=/var/lib/erigon/logs file prefix=erigon log level=info json=false
INFO[01-23|19:05:37.904] [snapshots] Blocks Stat                  blocks=28500k indices=28500k alloc=2.2GB sys=2.5GB
Note: prune_at doesn't mean 'all data before were deleted' - it just mean stage.Prune function were run to this block. Because 1 stage may prune multiple data types to different prune distance.

 				 stage_at 	 prune_at
Snapshots 			 35489804 	 0
Headers 			 35490443 	 0
BlockHashes 			 35490443 	 0
Bodies 				 35489802 	 0
Senders 			 35489802 	 0
Execution 			 35489802 	 35490443
Translation 			 0 		 0
HashState 			 35489802 	 0
IntermediateHashes 		 35489802 	 35489804
AccountHistoryIndex 		 35489802 	 0
StorageHistoryIndex 		 35489802 	 0
LogIndex 			 35489802 	 0
CallTraces 			 35489802 	 35489804
TxLookup 			 35489802 	 32756000
Finish 				 35489802 	 0
--
prune distance:

blocks.v2: blocks=28499999, segments=28499999, indices=28499999

history.v3: false, idx steps: 0.00, lastMaxTxNum=0->0, lastBlockInSnap=0

sequence: EthTx=5347560463, NonCanonicalTx=242771

in db: first header 32756000, last header 35502063, first body 32756000, last body 35490443
--

Tried to unwind and even reset the headers but I get the same output :

d316f7be99be:/go/bsc-erigon/build/bin$ ./integration stage_headers --reset --datadir /var/lib/erigon/ --chain bsc
INFO[01-23|19:07:38.057] logging to file system                   log dir=/var/lib/erigon/logs file prefix=erigon log level=info json=false
INFO[01-23|19:07:40.130] [snapshots] Blocks Stat                  blocks=28500k indices=28500k alloc=2.2GB sys=2.3GB
INFO[01-23|19:07:40.565] TruncateBlocks                           block=35502063
INFO[01-23|19:08:00.565] TruncateBlocks                           block=35490443
INFO[01-23|19:08:20.566] TruncateBlocks                           block=35490443
INFO[01-23|19:08:40.566] TruncateBlocks                           block=35490443
INFO[01-23|19:09:00.566] TruncateBlocks                           block=35490443
INFO[01-23|19:09:20.565] TruncateBlocks                           block=35490443
INFO[01-23|19:09:40.566] TruncateBlocks                           block=35490443
INFO[01-23|19:10:00.567] TruncateBlocks                           block=35490443
INFO[01-23|19:10:20.567] TruncateBlocks                           block=35490443

Deepanshu Gupta · Answer 1 · Wed Jan 24 2024 04:22:03 GMT+0800 (China Standard Time)

Hi there, Which version of erigon are you using?

And What was your startup command last time you used before data got corrupted? Can you share node logs as well?

blxdyx · Answer 2 · Wed Jan 24 2024 10:44:45 GMT+0800 (China Standard Time)

Seems stage_headers --reset dont work for you.
I think the best way is download new snapshot. https://github.com/bnb-chain/bsc-snapshots?tab=readme-ov-file#endpointmainnet-update-bi-weekly

mariel-sendblocks · Answer 3 · Wed Jan 24 2024 16:40:24 GMT+0800 (China Standard Time)

@deepcrazy The erigon version we used is the latest one - v1.1.12 .
We didn't have a problem with the startup, the corruption happened after our ec2 machine just stopped working and we had to brutally restart it (stop force).

The startup command we use :

      - erigon
      - --private.api.addr
      - 0.0.0.0:9090
      - --datadir
      - /var/lib/erigon
      - --port
      - 30303
      - --p2p.allowed-ports
      - 30303,30304,30305
      - --torrent.port
      - 42069
      - --nat
      - any
      - --chain
      - bsc
      - --networkid=56
      - info
      - --metrics
      - --metrics.addr
      - 0.0.0.0
      - --pprof.port=6061
      - --http
      - --http.addr
      - 0.0.0.0
      - --http.port
      - 8545
      - --http.vhosts=*
      - --http.corsdomain=*
      - --http.api
      - web3,eth,net,engine,debug,trace
      - --ws
      - --authrpc.addr
      - 0.0.0.0
      - --authrpc.port
      - 8551
      - --maxpeers
      - 100
      - --authrpc.jwtsecret
      - /var/lib/erigon/ee-secret/jwtsecret
      - --db.pagesize=16k
      - --sentry.drop-useless-peers=true
      - --p2p.protocol=66

blxdyx · Answer 4 · Thu Jan 25 2024 15:12:55 GMT+0800 (China Standard Time)

Maybe the data corruption in that time, so suggest download new snapshot

mariel-sendblocks · Answer 5 · Thu Jan 25 2024 22:56:07 GMT+0800 (China Standard Time)

Yep, we used a new snapshot instead of keep fixing the corruption.

blxdyx · Answer 6 · Mon Jan 29 2024 10:22:40 GMT+0800 (China Standard Time)

Reopen if still have some problem