Failure to connect to dhstore due to i/o timeout

Question

Failure to connect to dhstore due to i/o timeout

gammazero opened this issue 8 months ago · comments

There is an occasional intermittent failure to connect to a dhstore node due to an i/o timeout error when the indexer tries to delete metadata.

{"level":"error","ts":"2023-10-03T01:01:29.390Z","logger":"indexer/ingest","caller":"ingest/ingest.go:1210","msg":"Error while ingesting ad. Bailing early, not ingesting later ads.","publisher":"12D3KooWKa1NQKxveFmppQAjkHG8RGpjoQ2ivubeNuEUgZGgc2T6","adCid":"baguqeeragw6jbdqznqphihw264orde743jmy4hhesrma7p54eq6j3s3ebypq","err":"indexerErr: internal error: failed to remove provider context: Delete \"http://dhstore.internal.prod.cid.contact/metadata/92RXJkNDA2iXjxcAqD82Esxoe5y4Fy1jtaqAM49Z4d2p\": dial tcp 20.10.6.204:80: i/o timeout","adsLeftToProcess":22346}

Only appears to happen:

after the dhstore node has been running for days
on dhstore nodes that are only handling reads and deletes, not the current node being written to

Seems like it might be a kubernetes network issue since the problem does not appear to happen for a long time after a pod is restarted.

Andrew Gillis · Answer 1 · Thu Nov 16 2023 02:33:43 GMT+0800 (China Standard Time)

This may be related to the bug described here:
https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/

We should upgrade kubernetes.