openkruise / kruise

Automated management of large-scale applications on Kubernetes (incubating project under CNCF)

Home Page:https://openkruise.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] 1.5.0升至1.6.2后kruise-daemon无法启动

ibizdevops opened this issue · comments

What happened:
kruise 从1.5.0升至1.6.2版本后,kruise-daemon无法正常启动(如下图、日志所示)。
image

kruise-daemon log

I0412 09:38:47.251925       1 feature_gate.go:249] feature gates: &{map[ImagePullJobGate:true]}
I0412 09:38:47.548131       1 daemon.go:101] Starting daemon on kube-node8 ...
I0412 09:38:47.748266       1 request.go:622] Waited for 97.784798ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/argoproj.io/v1alpha1?timeout=32s
I0412 09:38:47.750178       1 request.go:622] Waited for 99.708262ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/monitoring.coreos.com/v1alpha1?timeout=32s
I0412 09:38:47.848137       1 request.go:622] Waited for 197.611035ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/admissionregistration.k8s.io/v1?timeout=32s
I0412 09:38:47.849563       1 request.go:622] Waited for 199.035025ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
I0412 09:38:47.948204       1 request.go:622] Waited for 297.662336ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1?timeout=32s
I0412 09:38:47.950038       1 request.go:622] Waited for 299.518701ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/scheduling.k8s.io/v1?timeout=32s
I0412 09:38:48.048176       1 request.go:622] Waited for 397.590839ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1beta1?timeout=32s
I0412 09:38:48.366550       1 remote_runtime.go:72] "Connecting to runtime service" endpoint="unix:///hostvarrun/dockershim.sock"
I0412 09:38:48.367647       1 remote_runtime.go:117] "Validating the CRI v1 API runtime version"
W0412 09:38:48.551939       1 factory.go:121] Failed to new runtime service for docker (unix:///hostvarrun/docker.sock, unix:///hostvarrun/dockershim.sock): validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///hostvarrun/dockershim.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService
I0412 09:38:48.748144       1 cri.go:44] "Connecting to image service" endpoint="/hostvarrun/containerd/containerd.sock"
I0412 09:38:48.748264       1 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/hostvarrun/containerd/containerd.sock" URL="unix:///hostvarrun/containerd/containerd.sock"
I0412 09:38:48.749376       1 helpers.go:227] "Finding the CRI API image version"
E0412 09:38:48.752420       1 cri.go:61] "Failed to determine CRI image API version" err="unable to determine image API version: rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService"
W0412 09:38:48.848232       1 factory.go:103] Failed to new image service for containerd (, unix:///hostvarrun/containerd/containerd.sock): unable to determine image API version: rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService
F0412 09:38:48.848407       1 main.go:69] Failed to new daemon: failed to new runtime factory: validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///hostvarrun/dockershim.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

Environment:

  • Kruise version: v1.5.0 -> v1.6.2
  • Kubernetes version: 1.22.17
  • Install details: helm upgrade kruise -f values.yaml .
  • Runtime: Docker 19.03.15

翻了一下影响的代码,升级 v1.6.2 失败的报错和 CRI client 的初始化有关系。
这个版本相关联的修改中有一个将 k8s api 升级到 v1.26,这个版本移除了对 v1alpha2 的 CRI 接口,导致无法兼容只支持 v1alpha2 CRI 的运行时(docker shim)

image

官网上的兼容性描述忽略了这种情况,需要进一步说明一下这个风险

从 v1.6.0 (alpha/beta) 开始,OpenKruise 要求在 Kubernetes >= 1.18 以上版本的集群中安装和使用。如果你关闭了 Kruise-Daemon 组件(featureGates="KruiseDaemon=false"),你依然可以在 K8S 1.16 和 1.17 的集群上安装和使用。

=>

从 v1.6.0 (alpha/beta) 开始,OpenKruise 要求在 Kubernetes >= 1.18 以上版本的集群中安装和使用。如果你关闭了 Kruise-Daemon 组件(featureGates="KruiseDaemon=false"),你依然可以在 K8S 1.16 和 1.17 的集群上安装和使用。

从 v1.6.0 (alpha/beta) 开始,OpenKruise Kruise-Daemon 将只支持 v1 CRI 的运行时。如果你关闭了 Kruise-Daemon 组件(featureGates="KruiseDaemon=false"),你依然可以在不支持 v1 CRI 的运行时节点集群上安装和使用。