feiskyer / kubernetes-handbook

Kubernetes Handbook (Kubernetes指南) https://kubernetes.feisky.xyz

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

在openstack虚拟机上部署k8s集群 添加service导致 节点崩溃

Morride opened this issue · comments

1.k8s集群是使用的kubeasz ansible安装的 版本为1.15.7
2.网络组件使用的是 calico 把mtu值修改成了1400 eth0网卡的mtu值是 1500 修改后为1450 (以前在其他openstack私有云虚拟机上也装过,也遇到了网络问题,那边的eth0网卡mtu值是1450 把calico的mtu设置成1400成功了)
3.遇到的问题 k8s搭建完后集群状态是好的 部署上mysql集群也没问题 (mysql集群没用svc)在为服务添加svc时发现node-0节点noready,svc使用的LoadBalancer 绑定的ip为node-0 配置如下

spec:
  type: LoadBalancer
  externalIPs:
  - 192.168.1.233
  ports:
  - port: 8024
    protocol: TCP
    targetPort: 8024
    nodePort: 30000

后查看日志

(base) [root@node-0 ~]# journalctl -f -u kubelet.service
-- Logs begin at Thu 2020-10-22 13:31:03 CST. --
Oct 22 14:14:42 node-0 kubelet[28641]: W1022 14:14:42.146793   28641 status_manager.go:485] Failed to get status for pod "calico-node-mmtpm_kube-system(9937bcd4-3cf1-4d3b-b4f9-76b603ff38f7)": Get https://10.84.155.33:6443/api/v1/namespaces/kube-system/pods/calico-node-mmtpm: read tcp 10.84.155.33:56470->10.84.155.33:6443: use of closed network connection
Oct 22 14:14:42 node-0 kubelet[28641]: E1022 14:14:42.146998   28641 reflector.go:125] object-"kube-system"/"calico-config": Failed to list *v1.ConfigMap: Get https://10.84.155.33:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dcalico-config&limit=500&resourceVersion=0: read tcp 10.84.155.33:56470->10.84.155.33:6443: use of closed network connection
Oct 22 14:14:42 node-0 kubelet[28641]: I1022 14:14:42.613399   28641 prober.go:112] Readiness probe for "calico-node-mmtpm_kube-system(9937bcd4-3cf1-4d3b-b4f9-76b603ff38f7):calico-node" failed (failure): calico/node is not ready: BIRD is not ready: BGP not established with 10.84.155.35,10.84.155.36,10.84.155.592020-10-22 06:14:42.589 [INFO][7606] health.go 156: Number of node(s) with BGP peering established = 0
Oct 22 14:14:52 node-0 kubelet[28641]: E1022 14:14:52.145553   28641 kubelet_node_status.go:388] Error updating node status, will retry: error getting node "10.84.155.33": Get https://10.84.155.33:6443/api/v1/nodes/10.84.155.33?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Oct 22 14:14:52 node-0 kubelet[28641]: E1022 14:14:52.145574   28641 kubelet_node_status.go:375] Unable to update node status: update node status exceeds retry count
Oct 22 14:14:52 node-0 kubelet[28641]: E1022 14:14:52.146179   28641 controller.go:125] failed to ensure node lease exists, will retry in 7s, error: Get https://10.84.155.33:6443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/10.84.155.33?timeout=10s: read tcp 10.84.155.33:57166->10.84.155.33:6443: use of closed network connection
Oct 22 14:14:52 node-0 kubelet[28641]: E1022 14:14:52.146257   28641 reflector.go:125] object-"kube-system"/"calico-config": Failed to list *v1.ConfigMap: Get https://10.84.155.33:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dcalico-config&limit=500&resourceVersion=0: read tcp 10.84.155.33:57166->10.84.155.33:6443: use of closed network connection

发现node-0节点的calico出问题了 随即查看日志
Error from server: Get https://10.84.155.33:10250/containerLogs/kube-system/calico-node-mmtpm/calico-node?follow=true: dial tcp 10.84.155.33:10250: connect: connection refused
发现无法连接
由于不是很懂openstack网络环境和其他环境的虚拟机有什么区别 有人能给出排查意见么
试过很多次 只要一创建svc就会导致节点挂掉

openstack的网络问题,推荐去openstack社区提问