can't resolve dns in kubernetes

Question

can't resolve dns in kubernetes

teke97 opened this issue 4 years ago · comments

I have installed prometheus in kubernetes 1.17.
I`m using helm and stable/prometheus-operator chart.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:35:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
$ helm3 version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}

I got problem

kubectl logs alertmanager-prometheus-operator-alertmanager-0 alertmanager
level=warn ts=2020-06-11T15:32:36.386Z caller=main.go:322 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to resolve alertmanager-prometheus-operator-alertmanager-0.alertmanager-operated.prometheus-operator.svc:9094: lookup alertmanager-prometheus-operator-alertmanager-0.alertmanager-operated.prometheus-operator.svc on 10.245.0.10:53: no such host\n\n"

After investigation I have realise that problem may be in busybox image.
The problem is similar to kubernetes/kubernetes#66924 (comment)

part of kubernetes deploymet config:

        - args:
        - -c
        - while true; do nslookup alertmanager-bot; sleep 10; done
        command:
        - /bin/sh
        image: busybox:1.31.1

pod log:

Server:		10.245.0.10
Address:	10.245.0.10:53

** server can't find alertmanager-bot.monitoring.svc.cluster.local: NXDOMAIN

*** Can't find alertmanager-bot.svc.cluster.local: No answer
*** Can't find alertmanager-bot.cluster.local: No answer
*** Can't find alertmanager-bot.monitoring.svc.cluster.local: No answer
*** Can't find alertmanager-bot.svc.cluster.local: No answer
*** Can't find alertmanager-bot.cluster.local: No answer

coredns log:

coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.561Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000202924s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000145229s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000084224s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000056272s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000060009s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000051978s

pod log with busybox 1.28.4:

Name:      alertmanager-bot
Address 1: 10.245.48.126 alertmanager-bot.monitoring.svc.cluster.local
Server:    10.245.0.10
Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local

coredns log:

coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:34:42.790Z [INFO] 10.244.0.204:53241 - 3 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000207196s
coredns-84c79f5fb4-bspnj coredns 2020-06-11T14:34:42.792Z [INFO] 10.244.0.204:57444 - 4 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000175375s

resolv.conf

/ # cat /etc/resolv.conf # the same on both images
nameserver 10.245.0.10
search monitoring.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

I have already opened a bug on busybox https://bugs.busybox.net/show_bug.cgi?id=13006
My cloud provider is digital ocean.
Image version

$ kubectl get pods -o yaml alertmanager-prometheus-operator-alertmanager-0 | grep image:
    image: quay.io/prometheus/alertmanager:v0.20.0
    image: quay.io/coreos/configmap-reload:v0.0.1

alertmanager version

/alertmanager $ alertmanager --version
alertmanager, version 0.20.0 (branch: HEAD, revision: f74be0400a6243d10bb53812d6fa408ad71ff32d)
  build user:       root@00c3106655f8
  build date:       20191211-14:13:14
  go version:       go1.13.5

busybox version

/alertmanager $ busybox | head
BusyBox v1.31.1 (2019-10-28 18:40:01 UTC) multi-call binary.

Tianon Gravi · Answer 1 · Tue Jun 16 2020 01:23:28 GMT+0800 (China Standard Time)

I'm fairly certain this is a duplicate of #75, #61, and #48; regardless, the following comment applies:

... this is an upstream BusyBox issue more appropriately discussed on their bugtracker (https://bugs.busybox.net) -- we're simply building their upstream sources verbatim against three libc variants.

For what it's worth, the sentiment I get from https://bugs.busybox.net/show_bug.cgi?id=11161#c3 is that BusyBox's nslookup command should be considered deprecated entirely, but also using busybox:1.28 appears to work for most users. That being said, please take further discussion of this issue to the upstream bug tracker.