ip-reconciler fails with "context deadline exceeded" while listing IPPools
xagent003 opened this issue · comments
The ip-reconciler is repeatedly failing (not just a one off, rare Job). Pod shows in CrashBackoffLoop state, and when we inspect the container logs:
2022-01-05T07:25:24Z [debug] NewReconcileLooper - Kubernetes config file located at: /host/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig
2022-01-05T07:25:25Z [debug] successfully read the kubernetes configuration file located at: /host/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig
2022-01-05T07:25:34Z [debug] listing IP pools
2022-01-05T07:25:35Z [error] failed to retrieve all IP pools: context deadline exceeded
2022-01-05T07:25:35Z [error] failed to create the reconcile looper: failed to retrieve all IP pools: context deadline exceeded
API server is up and no other workloads show this type of error. The resources are fetch'able via kubectl get ippools.whereabouts.cni.cncf.io
- to list all pools, or a specific one in detail.
Does the context timeout need to be reset, is the initial timeout value too small? Should we pass in a dummy context.TODO() before making the client call to get IPPool?
The ListPods may be taking a long time, 9 seconds in this case. Should we increase the timeout or should we be generating a new context for each client API call that takes in a context?
@maiqueb ?
The ListPods may be taking a long time, 9 seconds in this case. Should we increase the timeout or should we be generating a new context for each client API call that takes in a context?
@maiqueb ?
I was sloppy enough when I wrote the to use context.TODO
when listing the pods. I honestly think that is the single reason why we haven't seen this issue before - and at a larger scale.
Would you check #186 ? Odds are it fixes this issue.