Shutdown manager timeout on startup
owenhaynes opened this issue · comments
Description:
Not sure if this is a bug in the config but just using the very basic helm deployment. (repost from slack)
The shutdown manager sometimes on start up fails its health checks and k8s kills the container. I assume the shutdown manager then i assume tells envoy to stop processing requests. The shutdown manager then starts back up and the pod is now in limbo unhealthy as the readiness probe is failing
Going to increase the retries for this health check, but maybe the shutdown manager needs to be a little smarter
Repro steps:
redeploy envoy pod a lot, we have this on a pre-emptiable nodes so see it a lot, but has happened on normal nodes
Environment:
GKE 1.29/1.28
Envoy Gateway 1.0.1
K8s Event Logs:
2024-04-19 07:24:39.994 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503
2024-04-19 07:20:16.272 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Pulled Container image "envoyproxy/gateway-dev:62ff3e7" already present on machine
2024-04-19 07:20:11.042 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Killing Container shutdown-manager failed liveness probe, will be restarted
2024-04-19 07:20:01.015 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19002/healthz": dial tcp 10.8.0.2:19002: i/o timeout (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:20:01.008 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Unhealthy Liveness probe failed: Get "http://10.8.0.2:19002/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:19:51.042 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Unhealthy Liveness probe failed: Get "http://10.8.0.2:19002/healthz": dial tcp 10.8.0.2:19002: i/o timeout (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:19:42.940 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19001/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:19:42.933 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19002/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:19:41.671 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Started Started container shutdown-manager
2024-04-19 07:19:41.567 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Created Created container shutdown-manager
2024-04-19 07:19:41.524 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Pulled Successfully pulled image "envoyproxy/gateway-dev:62ff3e7" in 134ms (4.874s including waiting)
2024-04-19 07:19:36.658 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{shutdown-manager} Pulling Pulling image "envoyproxy/gateway-dev:62ff3e7"
2024-04-19 07:19:36.645 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Started Started container envoy
2024-04-19 07:19:36.539 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Created Created container envoy
2024-04-19 07:19:36.485 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Pulled Successfully pulled image "envoyproxy/envoy:distroless-v1.29.3" in 114ms (3.691s including waiting)
2024-04-19 07:19:32.792 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 spec.containers{envoy} Pulling Pulling image "envoyproxy/envoy:distroless-v1.29.3"
2024-04-19 07:19:24.004 | envoy-public-ea09f8eb-5fc985cc87-kp2l7 TaintManagerEviction Cancelling deletion of Pod envoy-gateway-system/envoy-public-ea09f8eb-5fc985cc87-kp2l7
This issue has been automatically marked as stale because it has not had activity in the last 30 days.