Kong / kong

🦍 The Cloud-Native API Gateway and AI Gateway.

Home Page:https://konghq.com/install/#kong-community

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Routing fails randomly, version 0.10.x

tairila opened this issue · comments

Summary

I noticed that sometimes Kong routing to an API fails, this happens randomly. When trying to access an application through Kong the following error message comes to browser window “An unexpected error occurred". Earlier this was working fine with Kong version 0.9.7 and Cassandra 2.x.

[error] 126#0: *8877 [lua] responses.lua:101: before(): failed the initial dns/balancer resolve for 'xxx' with: dns query returned no results, client: xxx.xxx.xxx.xxx, server: kong, request: "GET /yyy HTTP/1.1", host: "xxx:8080"

The API creation command:
curl -X POST localhost:8001/apis/ -d 'name=xxx' -d 'upstream_url=http://xxx:8080' -d 'preserve_host=true' -d 'uris=/yyy' -d 'strip_uri=true'

Steps To Reproduce

Repeat GET request several times for an API.

Additional Details & Logs

Kong version 0.10.0 & 0.10.1
Cassandra 3.0.10

The message explains exactly what happens. Kong queries the dns server to resolve the hostname but does not receive a proper answer from that server.

As you can see here it will take the timeout and attempts settings from the resolv.conf configuration file.

If they are not set, it will be 5 attempts and a timeout of 2 seconds.

The failed the initial dns/balancer resolve message is generated here, whilst the dns query returned no results is generated in the dns lib here, when the nameserver returns a record, but an empty one.

When Kong resolves a name it will try to resolve in the following order 'last-successful-type', SRV, A, AAAA and finally CNAME

what do the DNS records look like, in that order?

It is following:

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.4 <<>> mesos-ui.marathon.slave.mesos
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2190
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mesos-ui.marathon.slave.mesos. IN A

;; ANSWER SECTION:
mesos-ui.marathon.slave.mesos. 60 IN A 10.254.4.45

;; Query time: 0 msec
;; SERVER: 10.254.20.255#53(10.254.20.255)
;; WHEN: Thu Mar 30 10:26:27 EEST 2017
;; MSG SIZE rcvd: 63

I noticed one thing with resolv.conf file though, the error comes when it has following nameservers:

nameserver 10.254.20.255
nameserver 10.254.20.175
nameserver 10.254.10.93
; generated by /usr/sbin/dhclient-script
search emea.xxx.net china.xxx.net apac.xxx.net americas.xxx.net
nameserver 10.131.39.252
nameserver 87.254.221.110

In this case only the first 3 are relevant ones and when I tested routing with having only those in resolv.conf file (removed everything else from it) it is working fine (no errors)!

interesting, I'd expect the resolver to pick the next nameserver on a retry, but maybe it doesn't and then fails while keep trying the same bad nameserver.

What is the response you get if you explcitly query those removed servers?

actually I don't think the resolv.conf parser will honour the MAXNS setting of 3. See https://linux.die.net/man/5/resolv.conf

That's probably why the bad nameserver was queried where it shouldn't have been.

fixed it in Kong/lua-resty-dns-client#7

Kong dependency needs to be updated after releasing new dns client version