Error: failed to start container "scv"
IT-YUNMENGZE opened this issue · comments
Zeyuan Wang commented
执行:
[root@Master ~]# kubectl apply -f deploy.yaml
namespace/scv created
clusterrole.rbac.authorization.k8s.io/scv-cr created
serviceaccount/scv-sa created
clusterrolebinding.rbac.authorization.k8s.io/scv-crb created
daemonset.apps/scv-2 created
查看Pod的创建情况,状态卡在CrashLoopBackOff:
[root@Master ~]# kubectl get pods -o wide -n scv
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
scv-2-mmc78 0/1 CrashLoopBackOff 4 2m53s 10.244.1.3 node1 <none> <none>
scv-2-tvxw6 0/1 CrashLoopBackOff 4 2m53s 10.244.2.3 node2 <none> <none>
查看Pod的具体事件信息:
[root@Master ~]# kubectl describe pod scv-2-mmc78 -n scv
Name: scv-2-mmc78
Namespace: scv
Priority: 0
Node: node1/192.168.108.129
Start Time: Fri, 08 Oct 2021 17:03:04 +0800
Labels: app=scv
controller-revision-hash=6bb8c64d4f
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.244.1.3
IPs:
IP: 10.244.1.3
Controlled By: DaemonSet/scv-2
Containers:
scv:
Container ID: docker://63674a568e806db75497f9559f8a8e6ad08104b8de68fa72e212298bd0ad8e50
Image: registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv@sha256:90cf73758ff07175d00953ec510ba4af5c96bb3b9c985c3dd55cbee079357329
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown
Exit Code: 128
Started: Fri, 08 Oct 2021 17:06:06 +0800
Finished: Fri, 08 Oct 2021 17:06:06 +0800
Ready: False
Restart Count: 5
Limits:
memory: 200Mi
Requests:
cpu: 100m
memory: 200Mi
Environment:
NODE_NAME: (v1:spec.nodeName)
NVIDIA_VISIBLE_DEVICES: all
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from scv-sa-token-6zzls (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
scv-sa-token-6zzls:
Type: Secret (a volume populated by a Secret)
SecretName: scv-sa-token-6zzls
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m17s default-scheduler Successfully assigned scv/scv-2-mmc78 to node1
Normal Pulled 4m16s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0" in 960.991432ms
Normal Pulled 4m14s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0" in 871.025969ms
Normal Pulled 3m59s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0" in 880.149293ms
Warning Failed 3m32s (x4 over 4m16s) kubelet Error: failed to start container "scv": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown
Normal Pulled 3m32s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0" in 886.933766ms
Warning BackOff 2m56s (x6 over 3m56s) kubelet Back-off restarting failed container
Normal Pulling 2m45s (x5 over 4m17s) kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0"
Normal Created 2m44s (x5 over 4m16s) kubelet Created container scv
Normal Pulled 2m44s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/njupt-isl/scv:2.0" in 909.780206ms
错误信息:
Error: failed to start container "scv": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown
环境:
[root@Master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:51:19Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Zeyuan Wang commented
已解决,是虚拟机显卡的问题。