stellar / quickstart

Home of the stellar/quickstart docker image for development and testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is public archive or quickstart image for v20 good now for horizon ingestion?!

jun0tpyrc opened this issue · comments

What version are you using?

quickstart: 85a2c8b

What did you do?

use stellar:quickstart, only tuned two things

export HISTORY_RETENTION_COUNT=1500000
export PER_HOUR_RATE_LIMIT="0"

What did you expect to see?

nodes can get in sync

What did you see instead?

  • ingestion fails with looping - even pruning bucket/ bucket caches for horizon ->captive-core getting same results (multiple times)
time="2024-01-29T23:42:29.889Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=36500000 progress="83.95%" sequence=50141695 service=ingest source=historyArchive
time="2024-01-29T23:45:38.843Z" level=info msg="History: Catching up to ledger 50141759: Download & apply checkpoints: num checkpoints left to apply:0 (100% done)" pid=86 service=ingest s
ubservice=stellar-core
time="2024-01-29T23:45:43.082Z" level=info msg="History: Catching up to ledger 50142591: downloading ledger files 13/13 (100%)" pid=86 service=ingest subservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:46:22.160Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:12 (7% done)" pid=86 service=ingest su
bservice=stellar-core
...
(got broken at different boundary  and restart loop for processed_entries, for example)
...
time="2024-01-29T23:53:21.015Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=100000 progress="0.35%" sequence=50142591 service=ingest source=historyArchive

Can you upload an unabridged chunk of the Horizon and Stellar Core logs, at least where the restart is occurring?

stellar-core would be in sync with head but horizon can't ingest

  "horizon_version": "horizon-v2.28.0-(built-from-source)",
  "core_version": "v20.1.0",
  "ingest_latest_ledger": 0,
  "history_latest_ledger": 0,
  "history_latest_ledger_closed_at": "0001-01-01T00:00:00Z",
  "history_elder_ledger": 0,
  "core_latest_ledger": 50158275,
  "network_passphrase": "Public Global Stellar Network ; September 2015",
  "current_protocol_version": 19,
  "supported_protocol_version": 20,
  "core_supported_protocol_version": 20

truncated logs excluding httpapi request logs ref attachment (grep -v method=GET), lines highlighting ingestion loop restarting ...

time="2024-01-30T14:37:05.751Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=2700000 progress="11.36%" sequence=50151423
...
time="2024-01-30T14:38:41.316Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=50000 progress="0.34%" sequence=50151551

example-logs-fail-sync-loop.txt

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:

supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon

If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:

supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon

If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

We have tried multiple time , not only pruning /horizon/captive-core/bucket-cache/ but also the whole

bucket-cache  captive-core/buckets  stellar.db  stellar.db-shm  stellar.db-wal

folder structures of bucket/bucket-cache and horizoin db , never got it worked for few days

fyi impact may not be only fresh sync of v20 , I got teammate also reported (v19->v20) "upgraded node is stuck in a bucket download + ledger process loop" , which might be a similar issue

@Shaptic , this can be closed now that stellar/go#5197 merged? looks like it's headed into upcoming horizon 2.28.2

I think we can only close it once quickstart is released 👍 then @jun0tpyrc can reopen if the issue persists after upgrading

This should be closed by #565, please reopen if not!

confirmed latest quickstart image is working for quick sync , thanks