Consistently Seeing Reflector Watch Errors on Controller Shutdown
jonathan-innis opened this issue · comments
During controller shutdown, we consistently see errors that look like
logger.go:146: 2024-03-22T21:35:31.707Z INFO cache/reflector.go:462 pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
This occurs extremely consistently during shutdown and I wouldn't expect that we would see something that looks like an error that comes through an INFO/WARN path. From looking at the reflector code, it seems like this "error" is coming from this line. Is there a way to ensure that the runnable shutdown doesn't fire this error every time that we shutdown?
As an example, these "errors" are coming in our Karpenter E2E testing here: https://github.com/aws/karpenter-provider-aws/actions/runs/8396432349/job/22997824261
![Screenshot 2024-03-22 at 10 52 50 PM](https://private-user-images.githubusercontent.com/26334334/316177416-962461c8-85b6-4674-a099-d18c18490f79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI4MTAzMDgsIm5iZiI6MTcxMjgxMDAwOCwicGF0aCI6Ii8yNjMzNDMzNC8zMTYxNzc0MTYtOTYyNDYxYzgtODViNi00Njc0LWEwOTktZDE4YzE4NDkwZjc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDExVDA0MzMyOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ3YWNhYjM1ZDBiYjAxNDNiOGQzMmFlYmRlMWQ3NGI0ODg0OGVlYmRjMWQxMTI1MzQxZjlmZTU3MGQ4NzNmYmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KQmFMNOqHdCwhSuRcVeheTcDdxN8PDyINRLMPOf3Sgw)
/kind support
@jonathan-innis I meet the same problem,
will it causes memory leaks??
If the controller is shutting down, I don't think it's going to cause memory leaks. From looking through the code, it just looks like spurious error logging from the reflector as all the context cancels are happening, but I'm imagining there's a more graceful way to shut the reflector down so we don't see this.
@laihezhao Your error also looks quite different from mine. Yours appears to be caused by 500s occurring somewhere on the apiserver.
@troy0820 Got any thoughts here on how this can be improved? Ideally, we wouldn't be seeing errors for what appears to be a graceful shutdown for controller-runtime.
@jonathan-innis I am going to investigate this but this looks like it can be a bug, so I will label the issue with it so we can triage it a little better.
/kind bug