bitauth / chaingraph

A multi-node blockchain indexer and GraphQL API

Home Page:https://chaingraph.cash/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better DB connection error handling, attempt to reconnect with backoff

bitjson opened this issue · comments

The agent already does some recovery when nodes disconnect, but DB errors cause the agent to immediately shutdown.

It would be valuable to add some error handling and retry (with backoff) logic around both DB connection issues and DB errors when nodes are being initialized.

Because Hasura applies schema migrations, the agent sometimes attempts to initialize nodes before tables have been created in the DB (e.g. table "node" does not exist). If this happens, the agent should just wait a few seconds and try again (using the same backoff strategy as node disconnections).

This would prevent the cluster of agent restarts which typically happens when a new Chaingraph cluster is created, making logs a little less messy and beginning the sync faster.