algorand / conduit

Algorand's data pipeline framework.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Conduit will not reconnect/continue if algod has been restarted

PSjoe opened this issue · comments

Conduit will not reconnect/continue if algod has been restarted

We have noticed that if we restart our algod instances, conduit gets stuck and will not ingest new blocks once algod has finished restarting and is back on-line

Your environment

Software Versions:

  • algod 3.16.2

  • conduit 1.2.0

  • Pipeline configuration

  • algod config.json:

{
  "Archival": false,
  "CatchpointInterval": 0,
  "CatchupParallelBlocks": 32,
  "DNSBootstrapID": "betanet.algodev.network",
  "EnableDeveloperAPI": true,
  "EnableFollowMode": true,
  "EndpointAddress": "0.0.0.0:48081",
  "ForceFetchTransactions": false,
  "IsIndexerActive": false,
  "MaxAcctLookback": 256
}

conduit.yaml:

exporter:
  config:
    connection-string: <redacted>
  
    delete-task:
      interval: 1000
      rounds: 10000
  name: postgresql

hide-banner: true
importer:
  config:
    catchup-config:
      admin-token:  <redacted but confirmed correct>
    mode: follower
    netaddr: http://vira-betanet-blue-algod4conduit:48081
    token: <redacted but confirmed correct>
  name: algod
log-level: "INFO"
metrics:
  addr: ":9999"
  mode: "ON"
  prefix: "conduit"
retry-count: 0
  • Operating System details.
    We are running these in an AWS kubernetes cluster. Both Conduit and algod are your provided
    containers available on Docker Hub.

There is one alogd instance, fronted by a kubernetes service object. The network path looks like:

Conduit ---> k8s service ---> algod

When the algod pod is killed, it takes about 30-60s before it is re-registered in the k8s service and
available. The IP addresses do not change.

The Algod pod has a PVC (disk) attached and this same disk is re-attached to the recreated algod pod every
time. So algod is starting back up with exactly the same state as when it was shut down.

We have seen this using two ledgers now, Betanet and MainNet.

Steps to reproduce

  1. Setup Conduit/Algod in k8s as above
  2. Kill the algod pod with kubectl delete pod <algod_pod_name>
  3. Wait for the k8s stateful set to automatically re-create the pod after it has been deleted.
  4. Watch the logs for Conduit. It does not advance, even after waiting 20 minutes.

Expected behaviour

Conduit should fetch the next block from its dedicated algod instance as soon as it has restarted.

Actual behaviour

Conduit gets stuck trying to retrieve the next block. Restarting conduit or restarting algod a second time
does not help.

We see the following logs in conduit:

{"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline round: 28060661","time":"2023-07-19T19:47:23Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting status for round 28060661 (attempt 0): error getting status for round: Get \"http://vira-betanet-blue-algod4conduit:48081/v2/status/wait-for-block-after/28060660\": dial tcp 172.20.14.60:48081: connect: connection refused","time":"2023-07-19T19:47:23Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting status for round 28060661 (attempt 1): error getting status for round: Get \"http://vira-betanet-blue-algod4conduit:48081/v2/status/wait-for-block-after/28060660\": dial tcp 172.20.14.60:48081: connect: connection refused","time":"2023-07-19T19:47:24Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting status for round 28060661 (attempt 2): error getting status for round: Get \"http://vira-betanet-blue-algod4conduit:48081/v2/status/wait-for-block-after/28060660\": dial tcp 172.20.14.60:48081: connect: connection refused","time":"2023-07-19T19:47:25Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting status for round 28060661 (attempt 3): error getting status for round: Get \"http://vira-betanet-blue-algod4conduit:48081/v2/status/wait-for-block-after/28060660\": dial tcp 172.20.14.60:48081: connect: connection refused","time":"2023-07-19T19:47:26Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 28060661 (attempt 4): HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:48:27Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"failed to get block for round 28060661 after 5 attempts, check node configuration: HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:48:27Z"}
{"__type":"Conduit","_name":"main","level":"error","msg":"failed to get block for round 28060661 after 5 attempts, check node configuration: HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:48:27Z"}
{"__type":"Conduit","_name":"main","level":"info","msg":"Retry number 27 resuming after a 1s retry delay.","time":"2023-07-19T19:48:27Z"}
{"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline round: 28060661","time":"2023-07-19T19:48:28Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 28060661 (attempt 0): HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:49:28Z"}

{"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 28060661 (attempt 1): HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:50:28Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 28060661 (attempt 2): HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:51:28Z"}
{"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 28060661 (attempt 3): HTTP 404: {\"message\":\"failed to retrieve information from the ledger\"}\n","time":"2023-07-19T19:52:28Z"}

In the same time frame, we see the following logs from algod:

{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).periodicSync","level":"info","line":616,"msg":"It's been too long since our ledger advanced; resyncing","name":"","time":"2023-07-19T19:51:22.328185Z"}
{"Context":"sync","details":{"StartRound":28060660},"file":"telemetry.go","function":"github.com/algorand/go-algorand/logging.(*telemetryState).logTelemetry","instanceName":"aWLoG60wMexN2Akp","level":"info","line":255,"msg":"/ApplicationState/CatchupStart","name":"","session":"","time":"2023-07-19T19:51:22.328292Z","v":""}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060661): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.329278Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060663): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.330118Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060662): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.330821Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060664): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.331494Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060665): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332124Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060666): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332721Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060667): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332760Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060668): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332790Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060670): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332789Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060669): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332819Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060671): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332822Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060672): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332843Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060673): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332856Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060674): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332875Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060675): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332887Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060676): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332899Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060678): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332926Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060677): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332916Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060679): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332952Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060683): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332954Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060680): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332976Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060684): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332981Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060681): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.332998Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060682): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333018Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060685): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333029Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060687): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333058Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060686): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333058Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060688): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333083Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060690): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333084Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060689): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333108Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060691): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333109Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060692): did not fetch or write the block","name":"","time":"2023-07-19T19:51:22.333138Z"}
{"Context":"sync","details":{"StartRound":28060660,"EndRound":28060660,"Time":4898458,"InitSync":false},"file":"telemetry.go","function":"github.com/algorand/go-algorand/logging.(*telemetryState).logTelemetry","instanceName":"aWLoG60wMexN2Akp","level":"info","line":255,"msg":"/ApplicationState/CatchupStop","name":"","session":"","time":"2023-07-19T19:51:22.333176Z","v":""}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).sync","level":"info","line":688,"msg":"Catchup Service: finished catching up, now at round 28060660 (previously 28060660). Total time catching up 4.898458ms.","name":"","time":"2023-07-19T19:51:22.333218Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.4.144:48484 - - [2023-07-19 19:51:23.451617927 +0000 UTC m=+325.403445294] \"GET /metrics HTTP/1.1\" 200 0 \"Prometheus/2.34.0\" 270.898µs","time":"2023-07-19T19:51:23.451920Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.4.134:45858 - - [2023-07-19 19:51:26.258681648 +0000 UTC m=+328.210509005] \"GET /health HTTP/1.1\" 200 0 \"kube-probe/1.24+\" 34.082µs","time":"2023-07-19T19:51:26.258754Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.3.176:41880 - - [2023-07-19 19:50:28.340640642 +0000 UTC m=+270.292467999] \"GET /v2/status/wait-for-block-after/28060660 HTTP/1.1\" 200 653 \"Go-http-client/1.1\" 1m0.001869246s","time":"2023-07-19T19:51:28.342549Z"}
{"file":"utils.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/v2.returnError","level":"info","line":41,"msg":"ledger does not have entry 28060661 (latest 28060660, committed 28060660)","time":"2023-07-19T19:51:28.343474Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.3.176:41880 - - [2023-07-19 19:51:28.343426358 +0000 UTC m=+330.295253715] \"GET /v2/blocks/28060661?format=msgpack HTTP/1.1\" 404 61 \"Go-http-client/1.1\" 78.945µs","time":"2023-07-19T19:51:28.343564Z"}

These same log stanzas just repeat over and over.

It looks like you already have retry-count set to 0, which was my first thought.

I was able to reproduce something similar with the following:

A follower node running with a mounted volume on one terminal:

docker run --rm -it \
                -p 4190:8080 \
                --name algod-test-run \
                -e TOKEN=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa \
                -e PROFILE=conduit \
                -e NETWORK=betanet \
                -v (pwd)/docker_data_dir:/algod/data \
                algorand/algod:nightly

Conduit running in another with a basic config and retries set to 0:

# Log verbosity: PANIC, FATAL, ERROR, WARN, INFO, DEBUG, TRACE
log-level: INFO

# If no log file is provided logs are written to stdout.
#log-file:

# Number of retries to perform after a pipeline plugin error.
# Set to 0 to retry forever.
retry-count: 0

# Time duration to wait between retry attempts.
retry-delay: "1s"

# Optional filepath to use for pidfile.
#pid-filepath: /path/to/pidfile

# Whether or not to print the conduit banner on startup.
hide-banner: false

# When enabled prometheus metrics are available on '/metrics'
metrics:
    mode: OFF
    addr: ":9999"
    prefix: "conduit"

# The importer is typically an algod follower node.
importer:
    name: algod
    config:
        # The mode of operation, either "archival" or "follower".
        # * follower mode allows you to use a lightweight non-archival node as the
        #   data source. In addition, it will provide ledger state delta objects to
        #   the processors and exporter.
        # * archival mode allows you to start processing on any round but does not
        #   contain the ledger state delta objects required for the postgres writer.
        mode: "follower"

        # Algod API address.
        netaddr: "http://localhost:4190"

        # Algod API token. Found in the algod.token file.
        token: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

        # Algod catchpoint catchup arguments
        catchup-config:
            # Algod Admin API Token. Used for running fast catchup during startup
            # if the node needs to be initialized. Found in algod.admin.token file.
            admin-token: ""


# Zero or more processors may be defined to manipulate what data
# reaches the exporter.
processors:

# An exporter is defined to do something with the data.
exporter:
    name: "file_writer"
    config:
        # BlocksDir is the path to a directory where block data should be stored.
        # The directory is created if it doesn't exist. If no directory is provided
        # blocks are written to the Conduit data directory.
        #block-dir: "/path/to/block/files"

        # FilenamePattern is the format used to write block files. It uses go
        # string formatting and should accept one number for the round.
        # If the file has a '.gz' extension, blocks will be gzipped.
        # Default: "%[1]d_block.json"
        filename-pattern: "%[1]d_block.json"

        # DropCertificate is used to remove the vote certificate from the block data before writing files.
        drop-certificate: true

With this environment running I can shutdown and resume algod. Sometimes it works and sometimes it gets stuck. There seems to be a race condition where the node forgets or doesn't receive the latest sync round from Conduit and therefore it can't advance to the required round.

When it's stuck and algod is running, restarting Conduit gets algod started again.

I think what happens here is Conduit successfully sets the sync round to N+1, but before algod is able to fetch that round and save it to the disk it gets killed. When it starts back up it automatically sets the sync round back to N.

Unfortunately for me, with algod still running, restarting conduit does not get things unstuck. Not sure if it's made worse with the fact that these are running on two different machines and perhaps slightly more network latency is affecting it. (vs your experiment had them both running on the same box.)

I left things running overnight, and it's still in the same state. So nothing it was able to figure out in time.

@PSjoe when you restart conduit, do you see the sync round being posted in the algod logs?

It would look something like this:

{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"172.17.0.1:34258 - - [2023-07-20 15:18:29.763479994 +0000 UTC m=+6976.363070708] "POST /v2/ledger/sync/605600 HTTP/1.1" 200 0 "Go-http-client/1.1" 26.391µs","time":"2023-07-20T15:18:29.763514Z"}

Doesn't seem so. Here are the algod logs that show up when I restart conduit:

{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.3.61:49444 - - [2023-07-20 15:22:32.525925274 +0000 UTC m=+70594.477752641] \"GET /genesis HTTP/1.1\" 200 0 \"Go-http-client/1.1\" 85.085µs","time":"2023-07-20T15:22:32.526049Z"}
{"file":"utils.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/v2.returnError","level":"info","line":41,"msg":"round 28060661 too high: dbRound 28060404, deltas 256","time":"2023-07-20T15:22:32.526963Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.3.61:49444 - - [2023-07-20 15:22:32.526904418 +0000 UTC m=+70594.478731775] \"GET /v2/deltas/28060661?format=msgp HTTP/1.1\" 404 99 \"Go-http-client/1.1\" 102.176µs","time":"2023-07-20T15:22:32.527035Z"}
{"file":"logger.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1","level":"info","line":56,"msg":"10.20.3.61:49444 - - [2023-07-20 15:22:32.638449353 +0000 UTC m=+70594.590276720] \"GET /v2/status HTTP/1.1\" 200 653 \"Go-http-client/1.1\" 77.055µs","time":"2023-07-20T15:22:32.638558Z"}

{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).periodicSync","level":"info","line":616,"msg":"It's been too long since our ledger advanced; resyncing","name":"","time":"2023-07-20T15:22:36.476675Z"}
{"Context":"sync","details":{"StartRound":28060660},"file":"telemetry.go","function":"github.com/algorand/go-algorand/logging.(*telemetryState).logTelemetry","instanceName":"aWLoG60wMexN2Akp","level":"info","line":255,"msg":"/ApplicationState/CatchupStart","name":"","session":"","time":"2023-07-20T15:22:36.476750Z","v":""}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060661): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478430Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060662): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478461Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060663): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478482Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060664): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478497Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060665): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478510Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060666): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478526Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060667): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478539Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060668): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478550Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060669): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478562Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060670): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478576Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060671): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478590Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060672): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478603Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060673): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478617Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060674): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478628Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060675): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478645Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060676): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478660Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060677): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478674Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060678): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478687Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060679): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478703Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060680): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478715Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060681): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478777Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060682): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478885Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060683): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.478991Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060684): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479049Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060685): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479194Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060686): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479299Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060687): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479400Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060688): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479501Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060689): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479604Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060690): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479638Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060691): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479753Z"}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).pipelineCallback.func1","level":"info","line":451,"msg":"pipelineCallback(28060692): did not fetch or write the block","name":"","time":"2023-07-20T15:22:36.479843Z"}
{"Context":"sync","details":{"StartRound":28060660,"EndRound":28060660,"Time":3141223,"InitSync":false},"file":"telemetry.go","function":"github.com/algorand/go-algorand/logging.(*telemetryState).logTelemetry","instanceName":"aWLoG60wMexN2Akp","level":"info","line":255,"msg":"/ApplicationState/CatchupStop","name":"","session":"","time":"2023-07-20T15:22:36.479877Z","v":""}
{"Context":"sync","file":"service.go","function":"github.com/algorand/go-algorand/catchup.(*Service).sync","level":"info","line":688,"msg":"Catchup Service: finished catching up, now at round 28060660 (previously 28060660). Total time catching up 3.141223ms.","name":"","time":"2023-07-20T15:22:36.479897Z"}

I've confirmed that in our ElasticSearch cluster. There are no POST messages once the system gets here, even if we restart the conduit instance.

1.3.0 appears to have solved this. My stuck instances started moving again. I can restart algod instances and conduit picks back up as soon as it's able to reconnect to the algod instance.