envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway

Home Page:https://gateway.envoyproxy.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shutdown manager timeout on startup

owenhaynes opened this issue · comments

Description:
Not sure if this is a bug in the config but just using the very basic helm deployment. (repost from slack)

The shutdown manager sometimes on start up fails its health checks and k8s kills the container. I assume the shutdown manager then i assume tells envoy to stop processing requests. The shutdown manager then starts back up and the pod is now in limbo unhealthy as the readiness probe is failing

Going to increase the retries for this health check, but maybe the shutdown manager needs to be a little smarter

Repro steps:

redeploy envoy pod a lot, we have this on a pre-emptiable nodes so see it a lot, but has happened on normal nodes

Environment:
GKE 1.29/1.28
Envoy Gateway 1.0.1

K8s Event Logs:

2024-04-19 07:24:39.994 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503 
2024-04-19 07:20:16.272 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Pulled Container image "envoyproxy/gateway-dev:62ff3e7" already present on machine 
2024-04-19 07:20:11.042 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Killing Container shutdown-manager failed liveness probe, will be restarted 
2024-04-19 07:20:01.015 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19002/healthz": dial tcp 10.8.0.2:19002: i/o timeout (Client.Timeout exceeded while awaiting headers) 
2024-04-19 07:20:01.008 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Unhealthy Liveness probe failed: Get "http://10.8.0.2:19002/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-04-19 07:19:51.042 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Unhealthy Liveness probe failed: Get "http://10.8.0.2:19002/healthz": dial tcp 10.8.0.2:19002: i/o timeout (Client.Timeout exceeded while awaiting headers) 
2024-04-19 07:19:42.940 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19001/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)  
2024-04-19 07:19:42.933 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Unhealthy Readiness probe failed: Get "http://10.8.0.2:19002/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
2024-04-19 07:19:41.671 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Started Started container shutdown-manager 
2024-04-19 07:19:41.567 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Created Created container shutdown-manager 
2024-04-19 07:19:41.524 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Pulled Successfully pulled image "envoyproxy/gateway-dev:62ff3e7" in 134ms (4.874s including waiting) 
2024-04-19 07:19:36.658 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{shutdown-manager} Pulling Pulling image "envoyproxy/gateway-dev:62ff3e7"  
2024-04-19 07:19:36.645 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Started Started container envoy 
2024-04-19 07:19:36.539 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Created Created container envoy 
2024-04-19 07:19:36.485 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Pulled Successfully pulled image "envoyproxy/envoy:distroless-v1.29.3" in 114ms (3.691s including waiting) 
2024-04-19 07:19:32.792 | envoy-public-ea09f8eb-5fc985cc87-kp2l7  spec.containers{envoy} Pulling Pulling image "envoyproxy/envoy:distroless-v1.29.3"  
2024-04-19 07:19:24.004 | envoy-public-ea09f8eb-5fc985cc87-kp2l7   TaintManagerEviction Cancelling deletion of Pod envoy-gateway-system/envoy-public-ea09f8eb-5fc985cc87-kp2l7

This issue has been automatically marked as stale because it has not had activity in the last 30 days.