ipni / storetheindex

A directory of CIDs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failure to connect to dhstore due to i/o timeout

gammazero opened this issue · comments

There is an occasional intermittent failure to connect to a dhstore node due to an i/o timeout error when the indexer tries to delete metadata.

{"level":"error","ts":"2023-10-03T01:01:29.390Z","logger":"indexer/ingest","caller":"ingest/ingest.go:1210","msg":"Error while ingesting ad. Bailing early, not ingesting later ads.","publisher":"12D3KooWKa1NQKxveFmppQAjkHG8RGpjoQ2ivubeNuEUgZGgc2T6","adCid":"baguqeeragw6jbdqznqphihw264orde743jmy4hhesrma7p54eq6j3s3ebypq","err":"indexerErr: internal error: failed to remove provider context: Delete \"http://dhstore.internal.prod.cid.contact/metadata/92RXJkNDA2iXjxcAqD82Esxoe5y4Fy1jtaqAM49Z4d2p\": dial tcp 20.10.6.204:80: i/o timeout","adsLeftToProcess":22346}

Only appears to happen:

  • after the dhstore node has been running for days
  • on dhstore nodes that are only handling reads and deletes, not the current node being written to

Seems like it might be a kubernetes network issue since the problem does not appear to happen for a long time after a pod is restarted.

This may be related to the bug described here:
https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/

We should upgrade kubernetes.