Ingress nginx-ingress-controller readiness probe failed
steverhoades opened this issue · comments
Disclaimer: New to kubernetes and terraform.
I am working through your guide and have successfully deployed kubernetes to Vultr by way of the provision repository.
I am currently running into an issue however with the ingress/nginx-ingress-controller. According to the event log it is failing due to the readiness checks (http://<ip>:10254/healthz).
Any assistance here would be greatly appreciated.
Normal Scheduled 13m default-scheduler Successfully assigned ingress/nginx-ingress-controller-68c4654b64-mmn45 to vultr.guest
Warning Unhealthy 12m (x3 over 12m) kubelet, vultr.guest Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 12m (x3 over 12m) kubelet, vultr.guest Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Pulling 12m (x2 over 13m) kubelet, vultr.guest pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Warning Unhealthy 12m (x3 over 12m) kubelet, vultr.guest Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Killing 12m kubelet, vultr.guest Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 12m (x2 over 12m) kubelet, vultr.guest Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Normal Started 12m (x2 over 12m) kubelet, vultr.guest Started container
Normal Created 12m (x2 over 12m) kubelet, vultr.guest Created container
Normal Pulled 12m (x2 over 12m) kubelet, vultr.guest Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Normal Started 12m (x2 over 12m) kubelet, vultr.guest Started container
Normal Created 12m (x2 over 12m) kubelet, vultr.guest Created container
Warning Unhealthy 12m (x3 over 12m) kubelet, vultr.guest Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Pulling 12m (x2 over 13m) kubelet, vultr.guest pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Normal Killing 12m kubelet, vultr.guest Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 12m (x2 over 12m) kubelet, vultr.guest Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Normal Created 12m (x2 over 12m) kubelet, vultr.guest Created container
Normal Started 12m (x2 over 12m) kubelet, vultr.guest Started container
Warning Unhealthy 11m (x6 over 12m) kubelet, vultr.guest Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Pulling 11m (x3 over 13m) kubelet, vultr.guest pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
Normal Killing 11m (x2 over 12m) kubelet, vultr.guest Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 11m (x8 over 12m) kubelet, vultr.guest Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 3m2s (x26 over 9m) kubelet, vultr.guest Back-off restarting failed container
$ kubectl logs --follow -n ingress deployment/nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.21.0
Build: git-b65b85cd9
Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------
nginx version: nginx/1.15.6
W0114 02:45:19.180427 7 client_config.go:548] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0114 02:45:19.180730 7 main.go:196] Creating API client for https://10.96.0.1:443
I0114 02:45:19.201933 7 main.go:240] Running in Kubernetes cluster version v1.13 (v1.13.2) - git (clean) commit cff46ab41ff0bb44d8584413b598ad8360ec1def - platform linux/amd64
I0114 02:45:19.204756 7 main.go:101] Validated ingress/default-http-backend as the default backend.
I0114 02:45:19.383896 7 nginx.go:258] Starting NGINX Ingress controller
I0114 02:45:19.394616 7 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress", Name:"nginx-ingress-controller", UID:"4bd3da96-17a6-11e9-9a3f-560001d7c2e6", APIVersion:"v1", ResourceVersion:"10661", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress/nginx-ingress-controller
I0114 02:45:20.588427 7 nginx.go:279] Starting NGINX process
I0114 02:45:20.590132 7 leaderelection.go:187] attempting to acquire leader lease ingress/ingress-controller-leader-nginx...
W0114 02:45:20.591452 7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
I0114 02:45:20.591559 7 controller.go:172] Configuration changes detected, backend reload required.
I0114 02:45:20.615190 7 leaderelection.go:196] successfully acquired lease ingress/ingress-controller-leader-nginx
I0114 02:45:20.615996 7 status.go:148] new leader elected: nginx-ingress-controller-68c4654b64-mmn45
I0114 02:45:20.776051 7 controller.go:190] Backend successfully reloaded.
I0114 02:45:20.776242 7 controller.go:202] Initial sync, sleeping for 1 second.
[14/Jan/2019:02:45:21 +0000]TCP200000.001
W0114 02:45:24.350352 7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:27.705918 7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
141.105.109.219 - [141.105.109.219] - - [14/Jan/2019:02:45:29 +0000] "GET / HTTP/1.1" 404 153 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7" 423 0.000 [-] - - - - 22b6d6cdd9c481da64779f3f0cd8b5e9
W0114 02:45:31.017150 7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:34.350531 7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
Some additional information.
$ kubectl get pods -n ingress
NAME READY STATUS RESTARTS AGE
default-http-backend-58d7cfd5bc-xnfz7 0/1 Pending 0 36m
nginx-ingress-controller-68c4654b64-mmn45 1/1 Running 1 36m
I just noticed that if I hit my domain I do get a 404 from nginx and if i hit /healthz I get a 200 http status code. However it shows that the default-http-backend is not running.
One thing I just noticed is that even though I have 3 VM's provisioned with Vultr, my kubernetes cluster is only reporting one node.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
vultr.guest Ready master 168m v1.13.2
I would wager a guess this is why the default-http-backend and other services I have tried to spin up, like the certificate manager, are staying in the pending state?
It's hard to debug this w/o much information, but it seems that Wireguard couldn't establish a connection between hosts. You can check this using wg
.
Could you please also attach your Terraform configuration?
Here is the terraform configuration for Vultr (I think this is what your asking for?). Other than that I am using the terraform configuration that is in hobby-kube/provisioning.
variable "token" {}
variable "hosts" {
default = 0
}
variable "ssh_keys" {
type = "list"
}
variable "hostname_format" {
type = "string"
}
variable "region" {
type = "string"
}
variable "image" {
type = "string"
}
variable "apt_packages" {
type = "list"
default = []
}
// Find the ID of the Silicon Valley region.
data "vultr_region" "region" {
filter {
name = "name"
values = ["${var.region}"]
}
}
// Find the ID for CoreOS Container Linux.
data "vultr_os" "container_linux" {
filter {
name = "name"
values = ["${var.image}"]
}
}
// Find the ID for a starter plan.
data "vultr_plan" "starter" {
filter {
name = "price_per_month"
values = ["10.00"]
}
filter {
name = "ram"
values = ["2048"]
}
}
// Find the ID of an existing SSH key.
data "vultr_ssh_key" "squat" {
filter {
name = "name"
values = ["${var.ssh_keys}"]
}
}
provider "vultr" {
api_key = "${var.token}"
}
resource "vultr_instance" "host" {
name = "${format(var.hostname_format, count.index + 1)}"
region_id = "${data.vultr_region.region.id}"
plan_id = "${data.vultr_plan.starter.id}"
os_id = "${data.vultr_os.container_linux.id}"
ssh_key_ids = ["${data.vultr_ssh_key.squat.id}"]
tag = "container-linux"
count = "${var.hosts}"
provisioner "remote-exec" {
inline = [
"while fuser /var/lib/dpkg/lock >/dev/null 2>&1; do sleep 1; done",
"apt-get update",
"apt-get install -yq ufw ${join(" ", var.apt_packages)}",
]
}
}
output "hostnames" {
value = ["${vultr_instance.host.*.name}"]
}
output "public_ips" {
value = ["${vultr_instance.host.*.ipv4_address}"]
}
output "private_ips" {
value = ["${vultr_instance.host.*.ipv4_address}"]
}
output "private_network_interface" {
value = "ens3"
}
Here is the output of the wg
command.
root@vultr:~# wg
interface: wg0
public key: Zdgs+Za9oFijWbHPUORrsqABd47oC3r+Twbb2vcpD0E=
private key: (hidden)
listening port: 51820
peer: /Txeg7KeRfPMH2AZRvpJ7ERQNmuh7RSRgbPyu046zBk=
endpoint: <ip>:51820
allowed ips: 10.0.1.2/32
latest handshake: 1 minute, 48 seconds ago
transfer: 3.68 MiB received, 5.48 MiB sent
peer: f+dxzWZspU/j2U5+YE+lfFwYCjfy2qgMiTJUvI18y1c=
endpoint: <ip>:51820
allowed ips: 10.0.1.3/32
latest handshake: 1 minute, 48 seconds ago
transfer: 4.39 MiB received, 7.81 MiB sent
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
vultr.guest Ready master 114s v1.13.2
I have checked each of the nodes and it appears that kubelet and etcd are running on each of them.
I noticed that the master node is continually rebooting, I'm not sure why.
➜ kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
33m Normal Starting Node Starting kubelet.
33m Normal NodeHasSufficientMemory Node Node vultr.guest status is now: NodeHasSufficientMemory
33m Normal NodeHasNoDiskPressure Node Node vultr.guest status is now: NodeHasNoDiskPressure
33m Normal NodeHasSufficientPID Node Node vultr.guest status is now: NodeHasSufficientPID
33m Normal NodeAllocatableEnforced Node Updated Node Allocatable limit across pods
32m Normal RegisteredNode Node Node vultr.guest event: Registered Node vultr.guest in Controller
32m Normal Starting Node Starting kube-proxy.
32m Normal Starting Node Starting kubelet.
32m Normal Starting Node Starting kubelet.
32m Normal NodeAllocatableEnforced Node Updated Node Allocatable limit across pods
32m Normal NodeHasSufficientMemory Node Node vultr.guest status is now: NodeHasSufficientMemory
32m Normal NodeHasNoDiskPressure Node Node vultr.guest status is now: NodeHasNoDiskPressure
32m Normal NodeHasSufficientPID Node Node vultr.guest status is now: NodeHasSufficientPID
2m41s Warning Rebooted Node Node vultr.guest has been rebooted, boot id: dd534659-0994-422c-a118-cea5628f218a
32m Normal NodeAllocatableEnforced Node Updated Node Allocatable limit across pods
32m Normal NodeHasSufficientMemory Node Node vultr.guest status is now: NodeHasSufficientMemory
32m Normal NodeHasNoDiskPressure Node Node vultr.guest status is now: NodeHasNoDiskPressure
32m Normal NodeHasSufficientPID Node Node vultr.guest status is now: NodeHasSufficientPID
2m41s Warning Rebooted Node Node vultr.guest has been rebooted, boot id: dbc43870-97d6-4461-b492-bb52d3edadbd
3m8s Warning Rebooted Node Node vultr.guest has been rebooted, boot id: f897845b-a7ca-4e79-96e5-4ab6f3958217
32m Normal Starting Node Starting kube-proxy.
32m Normal Starting Node Starting kube-proxy.
I see the following streaming non-stop in the logs as well. I'm not quite sure what this means at this point.
Jan 14 16:35:24 guest kubelet[12655]: W0114 16:35:24.129172 12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(63410cb3-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:24 guest kubelet[12655]: W0114 16:35:24.131733 12655 kubelet.go:1647] Deleting mirror pod "kube-scheduler-vultr.guest_kube-system(6304025b-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:25 guest kubelet[12655]: W0114 16:35:25.134534 12655 kubelet.go:1647] Deleting mirror pod "kube-controller-manager-vultr.guest_kube-system(63a16dc4-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:25 guest kubelet[12655]: W0114 16:35:25.136411 12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(63410cb3-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:26 guest kubelet[12655]: W0114 16:35:26.138890 12655 kubelet.go:1647] Deleting mirror pod "kube-scheduler-vultr.guest_kube-system(643dba69-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:27 guest kubelet[12655]: W0114 16:35:27.143278 12655 kubelet.go:1647] Deleting mirror pod "kube-apiserver-vultr.guest_kube-system(64d76e21-181a-11e9-8776-560001d80420)" because it is outdated
Jan 14 16:35:27 guest kubelet[12655]: W0114 16:35:27.145020 12655 kubelet.go:1647] Deleting mirror pod "kube-controller-manager-vultr.guest_kube-system(64d4dbda-181a-11e9-8776-560001d80420)" because it is outdated
I think i might be getting further down the line here. It appears that with Vultr that all nodes are getting the same hostname vultr.guest. My guess is that this is what is messing with kubernetes. I am going to revisit the provisioning step to see how I can change this. Will report back.
So I was correct in my assumption. It appears the default hostnames created on Vultr are vultr.guest. Since each of the nodes had the same hostname this was creating my problem - I didn't expect this.
Here is the fixed provision file for Vultr. If you are interested in using it to follow along with the guide than you will need to add the following plugin https://github.com/squat/terraform-provider-vultr.
variable "token" {}
variable "hosts" {
default = 0
}
variable "ssh_keys" {
type = "list"
}
variable "hostname_format" {
type = "string"
}
variable "region" {
type = "string"
}
variable "image" {
type = "string"
}
variable "apt_packages" {
type = "list"
default = []
}
// Find the ID of the Silicon Valley region.
data "vultr_region" "region" {
filter {
name = "name"
values = ["${var.region}"]
}
}
// Find the ID for CoreOS Container Linux.
data "vultr_os" "container_linux" {
filter {
name = "name"
values = ["${var.image}"]
}
}
// Find the ID for a starter plan.
data "vultr_plan" "starter" {
filter {
name = "price_per_month"
values = ["10.00"]
}
filter {
name = "ram"
values = ["2048"]
}
}
// Find the ID of an existing SSH key.
data "vultr_ssh_key" "squat" {
filter {
name = "name"
values = ["${var.ssh_keys}"]
}
}
provider "vultr" {
api_key = "${var.token}"
}
resource "vultr_instance" "host" {
name = "${format(var.hostname_format, count.index + 1)}"
region_id = "${data.vultr_region.region.id}"
plan_id = "${data.vultr_plan.starter.id}"
os_id = "${data.vultr_os.container_linux.id}"
ssh_key_ids = ["${data.vultr_ssh_key.squat.id}"]
tag = "container-linux"
hostname = "${format(var.hostname_format, count.index + 1)}"
count = "${var.hosts}"
provisioner "remote-exec" {
inline = [
"while fuser /var/lib/dpkg/lock >/dev/null 2>&1; do sleep 1; done",
"apt-get update",
"apt-get install -yq ufw ${join(" ", var.apt_packages)}",
]
}
}
output "hostnames" {
value = ["${vultr_instance.host.*.name}"]
}
output "public_ips" {
value = ["${vultr_instance.host.*.ipv4_address}"]
}
output "private_ips" {
value = ["${vultr_instance.host.*.ipv4_address}"]
}
output "private_network_interface" {
value = "ens3"
}