Network for newly created pods fails
felskrone opened this issue · comments
I trying to setup my k8s-cluster with canal, but fail to do so due to errors in the kubelets logfile or rather setting up the network properly.
Canal from here https://github.com/projectcalico/canal/blob/master/k8s-install/1.7/canal.yaml
RBAC from https://github.com/projectcalico/canal/blob/master/k8s-install/1.7/rbac.yaml
Installation of canal looks good.
master01: # kubectl create -f rbac.yaml
clusterrole "calico" created
clusterrole "flannel" created
clusterrolebinding "canal-flannel" created
clusterrolebinding "canal-calico" created
master01: # kubectl create -f canal.yaml
configmap "canal-config" created
daemonset "canal" created
customresourcedefinition "globalfelixconfigs.crd.projectcalico.org" created
customresourcedefinition "globalbgpconfigs.crd.projectcalico.org" created
customresourcedefinition "ippools.crd.projectcalico.org" created
customresourcedefinition "globalnetworkpolicies.crd.projectcalico.org" created
serviceaccount "canal" created
All pods seem to come up fine.
master01: # kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system canal-48b2q 3/3 Running 1 1m
kube-system canal-55l5s 3/3 Running 0 1m
kube-system canal-85h8c 3/3 Running 1 1m
kube-system canal-9mkl5 3/3 Running 1 1m
kube-system canal-gfzsf 3/3 Running 0 1m
kube-system canal-jklmk 3/3 Running 0 1m
kube-system canal-k5l5d 3/3 Running 1 1m
kube-system canal-r13bp 3/3 Running 0 1m
kube-system canal-s768v 3/3 Running 0 1m
kube-system canal-x3b57 3/3 Running 1 1m
After that i create a simple busybox pod.
apiVersion: v1
kind: Pod
metadata:
name: busybox-w2
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
The busybox-pod never receives a proper network-config and stays in state 'ContainerCreating'.
master01: # kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-w2 0/1 ContainerCreating 0 31s
Expected Behavior
Successful pod-creating and proper setup of networking for the pod.
Current Behavior
The pod is never created and the kubelet states that SSL-Cert-files are missing.
Nov 03 16:29:49 worker02 kubelet[12112]: I1103 16:29:49.095395 12112 kuberuntime_manager.go:557] SyncPod received new pod "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)", will create a new sandbox for it
Nov 03 16:29:49 worker02 kubelet[12112]: I1103 16:29:49.095413 12112 kuberuntime_manager.go:566] Stopping PodSandbox for "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)", will start new one
Nov 03 16:29:49 worker02 kubelet[12112]: I1103 16:29:49.095449 12112 kuberuntime_manager.go:612] Creating sandbox for pod "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)"
Nov 03 16:29:49 worker02 kubelet[12112]: E1103 16:29:49.525442 12112 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w2_default_27c6d382-c0aa-11e7-bc63-0022195f6b5b_0 in sandbox e846f3ac1bffad9bf76eccac8130104e3f62a2456dc82454a6af7d54ceba705e: open /etc/cni/net.d/calico-tls/etcd-cert: no such file or directory
Nov 03 16:29:49 worker02 kubelet[12112]: E1103 16:29:49.525511 12112 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w2_default_27c6d382-c0aa-11e7-bc63-0022195f6b5b_0 in sandbox e846f3ac1bffad9bf76eccac8130104e3f62a2456dc82454a6af7d54ceba705e: open /etc/cni/net.d/calico-tls/etcd-cert: no such file or directory
Nov 03 16:29:49 worker02 kubelet[12112]: E1103 16:29:49.525537 12112 kuberuntime_manager.go:618] createPodSandbox for pod "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w2_default_27c6d382-c0aa-11e7-bc63-0022195f6b5b_0 in sandbox e846f3ac1bffad9bf76eccac8130104e3f62a2456dc82454a6af7d54ceba705e: open /etc/cni/net.d/calico-tls/etcd-cert: no such file or directory
Nov 03 16:29:49 worker02 kubelet[12112]: E1103 16:29:49.525595 12112 pod_workers.go:182] Error syncing pod 27c6d382-c0aa-11e7-bc63-0022195f6b5b ("busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)"), skipping: failed to "CreatePodSandbox" for "busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"busybox-w2_default(27c6d382-c0aa-11e7-bc63-0022195f6b5b)\" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w2_default_27c6d382-c0aa-11e7-bc63-0022195f6b5b_0 in sandbox e846f3ac1bffad9bf76eccac8130104e3f62a2456dc82454a6af7d54ceba705e: open /etc/cni/net.d/calico-tls/etcd-cert: no such file or directory"
I have not altered the rbac.yaml or canal.yaml in any way, and there is no ssl-configuration in there.
The canal-pods 10-calico.conf also has no ssl-stuff in it.
worker02:/etc/cni/net.d# cat 10-calico.conf
{
"name": "k8s-pod-network",
"cniVersion": "0.1.0",
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "worker02",
"mtu": 1500,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s",
"k8s_auth_token": "eyJhbGciOiJSUzI1NiIsInR5c....."
},
"kubernetes": {
"k8s_api_root": "https://10.x.x.x:443",
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
}
Where does the path '/etc/cni/net.d/calico-tls/etcd-cert' come from in the kubelets logs?
Where can i debug whats going wrong?
Since the above canal.yaml uses the kubernetes-datastore, why is etcd involved in any way?
Your Environment
- Calico version: quay.io/calico/node:v2.5.1 (see linked canal.yaml above)
- Flannel version: quay.io/coreos/flannel:v0.8.0 (see linked canal.yaml above)
- Orchestrator version: Kubernetes 1.7.6 with RBAC
- Operating System and version: Debian Stretch
@felskrone interesting. Agreed that using the kubernetes api datastore, etcd certs shouldn't be involved at all.
Are there any other config files in /etc/cni/net.d
?
I have not figured whats wrong, but i restarted from scratch and that seems to have resolved this.
I have another question, but its not related to this error.