Ingress nginx-ingress-controller readiness probe failed

Question

Ingress nginx-ingress-controller readiness probe failed

steverhoades opened this issue 6 years ago · comments

Disclaimer: New to kubernetes and terraform.

I am working through your guide and have successfully deployed kubernetes to Vultr by way of the provision repository.

I am currently running into an issue however with the ingress/nginx-ingress-controller. According to the event log it is failing due to the readiness checks (http://<ip>:10254/healthz).

Any assistance here would be greatly appreciated.

  Normal   Scheduled  13m                 default-scheduler     Successfully assigned ingress/nginx-ingress-controller-68c4654b64-mmn45 to vultr.guest
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    12m (x2 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    12m                 kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    12m (x2 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Killing    12m                 kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Warning  Unhealthy  11m (x6 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    11m (x3 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Killing    11m (x2 over 12m)   kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  11m (x8 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    3m2s (x26 over 9m)  kubelet, vultr.guest  Back-off restarting failed container

$ kubectl logs --follow -n ingress deployment/nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.21.0
  Build:      git-b65b85cd9
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

nginx version: nginx/1.15.6
W0114 02:45:19.180427       7 client_config.go:548] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0114 02:45:19.180730       7 main.go:196] Creating API client for https://10.96.0.1:443
I0114 02:45:19.201933       7 main.go:240] Running in Kubernetes cluster version v1.13 (v1.13.2) - git (clean) commit cff46ab41ff0bb44d8584413b598ad8360ec1def - platform linux/amd64
I0114 02:45:19.204756       7 main.go:101] Validated ingress/default-http-backend as the default backend.
I0114 02:45:19.383896       7 nginx.go:258] Starting NGINX Ingress controller
I0114 02:45:19.394616       7 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress", Name:"nginx-ingress-controller", UID:"4bd3da96-17a6-11e9-9a3f-560001d7c2e6", APIVersion:"v1", ResourceVersion:"10661", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress/nginx-ingress-controller
I0114 02:45:20.588427       7 nginx.go:279] Starting NGINX process
I0114 02:45:20.590132       7 leaderelection.go:187] attempting to acquire leader lease  ingress/ingress-controller-leader-nginx...
W0114 02:45:20.591452       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
I0114 02:45:20.591559       7 controller.go:172] Configuration changes detected, backend reload required.
I0114 02:45:20.615190       7 leaderelection.go:196] successfully acquired lease ingress/ingress-controller-leader-nginx
I0114 02:45:20.615996       7 status.go:148] new leader elected: nginx-ingress-controller-68c4654b64-mmn45
I0114 02:45:20.776051       7 controller.go:190] Backend successfully reloaded.
I0114 02:45:20.776242       7 controller.go:202] Initial sync, sleeping for 1 second.
[14/Jan/2019:02:45:21 +0000]TCP200000.001
W0114 02:45:24.350352       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:27.705918       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
141.105.109.219 - [141.105.109.219] - - [14/Jan/2019:02:45:29 +0000] "GET / HTTP/1.1" 404 153 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7" 423 0.000 [-] - - - - 22b6d6cdd9c481da64779f3f0cd8b5e9
W0114 02:45:31.017150       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:34.350531       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint

Steve Rhoades · Answer 1 · Mon Jan 14 2019 11:22:00 GMT+0800 (China Standard Time)

Some additional information.

$ kubectl get pods -n ingress

NAME                                        READY   STATUS    RESTARTS   AGE
default-http-backend-58d7cfd5bc-xnfz7       0/1     Pending   0          36m
nginx-ingress-controller-68c4654b64-mmn45   1/1     Running   1          36m

Steve Rhoades · Answer 2 · Mon Jan 14 2019 12:26:18 GMT+0800 (China Standard Time)

I just noticed that if I hit my domain I do get a 404 from nginx and if i hit /healthz I get a 200 http status code. However it shows that the default-http-backend is not running.

Steve Rhoades · Answer 3 · Mon Jan 14 2019 13:12:41 GMT+0800 (China Standard Time)

One thing I just noticed is that even though I have 3 VM's provisioned with Vultr, my kubernetes cluster is only reporting one node.

$ kubectl get nodes
NAME          STATUS   ROLES    AGE    VERSION
vultr.guest   Ready    master   168m   v1.13.2

I would wager a guess this is why the default-http-backend and other services I have tried to spin up, like the certificate manager, are staying in the pending state?

Patrick Stadler · Answer 4 · Mon Jan 14 2019 15:53:37 GMT+0800 (China Standard Time)

It's hard to debug this w/o much information, but it seems that Wireguard couldn't establish a connection between hosts. You can check this using wg.

Patrick Stadler · Answer 5 · Mon Jan 14 2019 15:55:10 GMT+0800 (China Standard Time)

Could you please also attach your Terraform configuration?

Steve Rhoades · Answer 6 · Tue Jan 15 2019 00:06:07 GMT+0800 (China Standard Time)

Here is the terraform configuration for Vultr (I think this is what your asking for?). Other than that I am using the terraform configuration that is in hobby-kube/provisioning.

variable "token" {}

variable "hosts" {
  default = 0
}

variable "ssh_keys" {
  type = "list"
}

variable "hostname_format" {
  type = "string"
}

variable "region" {
  type = "string"
}

variable "image" {
  type = "string"
}

variable "apt_packages" {
  type    = "list"
  default = []
}

// Find the ID of the Silicon Valley region.
data "vultr_region" "region" {
  filter {
    name   = "name"
    values = ["${var.region}"]
  }
}

// Find the ID for CoreOS Container Linux.
data "vultr_os" "container_linux" {
  filter {
    name   = "name"
    values = ["${var.image}"]
  }
}

// Find the ID for a starter plan.
data "vultr_plan" "starter" {
  filter {
    name   = "price_per_month"
    values = ["10.00"]
  }

  filter {
    name   = "ram"
    values = ["2048"]
  }
}

// Find the ID of an existing SSH key.
data "vultr_ssh_key" "squat" {
  filter {
    name   = "name"
    values = ["${var.ssh_keys}"]
  }
}

provider "vultr" {
  api_key = "${var.token}"
}

resource "vultr_instance" "host" {
  name              = "${format(var.hostname_format, count.index + 1)}"
  region_id         = "${data.vultr_region.region.id}"
  plan_id           = "${data.vultr_plan.starter.id}"
  os_id             = "${data.vultr_os.container_linux.id}"
  ssh_key_ids       = ["${data.vultr_ssh_key.squat.id}"]
  tag               = "container-linux"

  count = "${var.hosts}"

  provisioner "remote-exec" {
    inline = [
      "while fuser /var/lib/dpkg/lock >/dev/null 2>&1; do sleep 1; done",
      "apt-get update",
      "apt-get install -yq ufw ${join(" ", var.apt_packages)}",
    ]
  }
}

output "hostnames" {
  value = ["${vultr_instance.host.*.name}"]
}

output "public_ips" {
  value = ["${vultr_instance.host.*.ipv4_address}"]
}

output "private_ips" {
  value = ["${vultr_instance.host.*.ipv4_address}"]
}

output "private_network_interface" {
  value = "ens3"
}

Here is the output of the wg command.

root@vultr:~# wg
interface: wg0
  public key: Zdgs+Za9oFijWbHPUORrsqABd47oC3r+Twbb2vcpD0E=
  private key: (hidden)
  listening port: 51820

peer: /Txeg7KeRfPMH2AZRvpJ7ERQNmuh7RSRgbPyu046zBk=
  endpoint: <ip>:51820
  allowed ips: 10.0.1.2/32
  latest handshake: 1 minute, 48 seconds ago
  transfer: 3.68 MiB received, 5.48 MiB sent

peer: f+dxzWZspU/j2U5+YE+lfFwYCjfy2qgMiTJUvI18y1c=
  endpoint: <ip>:51820
  allowed ips: 10.0.1.3/32
  latest handshake: 1 minute, 48 seconds ago
  transfer: 4.39 MiB received, 7.81 MiB sent

$ kubectl get nodes
NAME          STATUS   ROLES    AGE    VERSION
vultr.guest   Ready    master   114s   v1.13.2

Steve Rhoades · Answer 7 · Tue Jan 15 2019 00:38:56 GMT+0800 (China Standard Time)

I have checked each of the nodes and it appears that kubelet and etcd are running on each of them.

I noticed that the master node is continually rebooting, I'm not sure why.

➜ kubectl get events
LAST SEEN   TYPE      REASON                    KIND   MESSAGE
33m         Normal    Starting                  Node   Starting kubelet.
33m         Normal    NodeHasSufficientMemory   Node   Node vultr.guest status is now: NodeHasSufficientMemory
33m         Normal    NodeHasNoDiskPressure     Node   Node vultr.guest status is now: NodeHasNoDiskPressure
33m         Normal    NodeHasSufficientPID      Node   Node vultr.guest status is now: NodeHasSufficientPID
33m         Normal    NodeAllocatableEnforced   Node   Updated Node Allocatable limit across pods
32m         Normal    RegisteredNode            Node   Node vultr.guest event: Registered Node vultr.guest in Controller
32m         Normal    Starting                  Node   Starting kube-proxy.
32m         Normal    Starting                  Node   Starting kubelet.
32m         Normal    Starting                  Node   Starting kubelet.
32m         Normal    NodeAllocatableEnforced   Node   Updated Node Allocatable limit across pods
32m         Normal    NodeHasSufficientMemory   Node   Node vultr.guest status is now: NodeHasSufficientMemory
32m         Normal    NodeHasNoDiskPressure     Node   Node vultr.guest status is now: NodeHasNoDiskPressure
32m         Normal    NodeHasSufficientPID      Node   Node vultr.guest status is now: NodeHasSufficientPID
2m41s       Warning   Rebooted                  Node   Node vultr.guest has been rebooted, boot id: dd534659-0994-422c-a118-cea5628f218a
32m         Normal    NodeAllocatableEnforced   Node   Updated Node Allocatable limit across pods
32m         Normal    NodeHasSufficientMemory   Node   Node vultr.guest status is now: NodeHasSufficientMemory
32m         Normal    NodeHasNoDiskPressure     Node   Node vultr.guest status is now: NodeHasNoDiskPressure
32m         Normal    NodeHasSufficientPID      Node   Node vultr.guest status is now: NodeHasSufficientPID
2m41s       Warning   Rebooted                  Node   Node vultr.guest has been rebooted, boot id: dbc43870-97d6-4461-b492-bb52d3edadbd
3m8s        Warning   Rebooted                  Node   Node vultr.guest has been rebooted, boot id: f897845b-a7ca-4e79-96e5-4ab6f3958217
32m         Normal    Starting                  Node   Starting kube-proxy.
32m         Normal    Starting                  Node   Starting kube-proxy.

I see the following streaming non-stop in the logs as well. I'm not quite sure what this means at this point.

Jan 14 16:35:24 guest kubelet[12655]: W0114 16:35:24.129172   12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(63410cb3-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:24 guest kubelet[12655]: W0114 16:35:24.131733   12655 kubelet.go:1647] Deleting mirror pod "kube-scheduler-vultr.guest_kube-system(6304025b-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:25 guest kubelet[12655]: W0114 16:35:25.134534   12655 kubelet.go:1647] Deleting mirror pod "kube-controller-manager-vultr.guest_kube-system(63a16dc4-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:25 guest kubelet[12655]: W0114 16:35:25.136411   12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(63410cb3-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:26 guest kubelet[12655]: W0114 16:35:26.138890   12655 kubelet.go:1647] Deleting mirror pod "kube-scheduler-vultr.guest_kube-system(643dba69-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:27 guest kubelet[12655]: W0114 16:35:27.143278   12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(64d76e21-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:27 guest kubelet[12655]: W0114 16:35:27.145020   12655 kubelet.go:1647] Deleting mirror pod "kube-controller-manager-vultr.guest_kube-system(64d4dbda-181a-11e9-8776-560001d80420)" because it is outdated

Steve Rhoades · Answer 8 · Tue Jan 15 2019 01:19:47 GMT+0800 (China Standard Time)

I think i might be getting further down the line here. It appears that with Vultr that all nodes are getting the same hostname vultr.guest. My guess is that this is what is messing with kubernetes. I am going to revisit the provisioning step to see how I can change this. Will report back.

Steve Rhoades · Answer 9 · Tue Jan 15 2019 01:35:47 GMT+0800 (China Standard Time)

So I was correct in my assumption. It appears the default hostnames created on Vultr are vultr.guest. Since each of the nodes had the same hostname this was creating my problem - I didn't expect this.

Here is the fixed provision file for Vultr. If you are interested in using it to follow along with the guide than you will need to add the following plugin https://github.com/squat/terraform-provider-vultr.

variable "token" {}

variable "hosts" {
  default = 0
}

variable "ssh_keys" {
  type = "list"
}

variable "hostname_format" {
  type = "string"
}

variable "region" {
  type = "string"
}

variable "image" {
  type = "string"
}

variable "apt_packages" {
  type    = "list"
  default = []
}

// Find the ID of the Silicon Valley region.
data "vultr_region" "region" {
  filter {
    name   = "name"
    values = ["${var.region}"]
  }
}

// Find the ID for CoreOS Container Linux.
data "vultr_os" "container_linux" {
  filter {
    name   = "name"
    values = ["${var.image}"]
  }
}

// Find the ID for a starter plan.
data "vultr_plan" "starter" {
  filter {
    name   = "price_per_month"
    values = ["10.00"]
  }

  filter {
    name   = "ram"
    values = ["2048"]
  }
}

// Find the ID of an existing SSH key.
data "vultr_ssh_key" "squat" {
  filter {
    name   = "name"
    values = ["${var.ssh_keys}"]
  }
}

provider "vultr" {
  api_key = "${var.token}"
}

resource "vultr_instance" "host" {
  name              = "${format(var.hostname_format, count.index + 1)}"
  region_id         = "${data.vultr_region.region.id}"
  plan_id           = "${data.vultr_plan.starter.id}"
  os_id             = "${data.vultr_os.container_linux.id}"
  ssh_key_ids       = ["${data.vultr_ssh_key.squat.id}"]
  tag               = "container-linux"
  hostname          = "${format(var.hostname_format, count.index + 1)}"

  count = "${var.hosts}"

  provisioner "remote-exec" {
    inline = [
      "while fuser /var/lib/dpkg/lock >/dev/null 2>&1; do sleep 1; done",
      "apt-get update",
      "apt-get install -yq ufw ${join(" ", var.apt_packages)}",
    ]
  }
}

output "hostnames" {
  value = ["${vultr_instance.host.*.name}"]
}

output "public_ips" {
  value = ["${vultr_instance.host.*.ipv4_address}"]
}

output "private_ips" {
  value = ["${vultr_instance.host.*.ipv4_address}"]
}

output "private_network_interface" {
  value = "ens3"
}