BerlinVagrant / vagrant-dns

A plugin to manage DNS records for vagrant environments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

10 second ping times, maybe related to RARP not working

mexon opened this issue · comments

I have a very simple setup, and it does correctly resolve names. At first looking up names was very slow, but I discovered the passthrough = .unknown trick. Now nslookup is fast, but ping is still slow.

vagrant@machine:~$ date -Iseconds ; ping -c1 machine.test ; date -Iseconds
2024-01-06T16:26:37+00:00
PING machine.test (192.168.56.10) 56(84) bytes of data.
64 bytes from 192.168.56.10: icmp_seq=1 ttl=64 time=0.033 ms

--- machine.test ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.033/0.033/0.033/0.000 ms
2024-01-06T16:26:47+00:00
vagrant@machine:~$ date -Iseconds ; nslookup machine.test ; date -Iseconds
2024-01-06T16:28:38+00:00
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	machine.test
Address: 192.168.56.10
** server can't find machine.test: NOTIMP

2024-01-06T16:28:38+00:00

This is just one ping, but with repeated pings each individual request is slow. And yet, the times reported are fast, suggesting it's the DNS lookups that are slow not the ping itself.

I'm running Ubuntu 22.04 on both the host and the guest. Vagrant is 2.2.19, Virtualbox is 6.1.38. Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/jammy64"
  config.vm.network "private_network", ip: "192.168.56.10"
  config.dns.tld = "test"
  config.vm.hostname = "machine"
end

VagrantDNS::Config.logger = Logger.new("dns.log")
VagrantDNS::Config.passthrough = :unknown

I turned up the resolver logging. I can see the initial A and AAAA requests return quickly. But then I see RARP requests every 5s until it gives up after 2min. I suspect that this is related. Hypothesis is that ping wants to do RARP lookups for some reason, this doesn't work immediately, and after 10s it gives up. Meanwhile the resolver keeps retrying for 2min. I don't know enough about ping to know why it would do that. But in any case, this sounds similar to the AAAA timeout problems I saw earlier: if RARP won't work it should say so immediately rather than waiting for a two-minute timeout.

I ruled out some possible causes. I noticed that my guests' clocks were two minutes out of sync, so I installed NTP. That didn't help. I also suspected it might be due to having wifi and ethernet both connected to the same network at the same time, but switching off ethernet didn't help. And I thought maybe all the continual requests from the rest of the system might be slowing it down, but I stripped away everything except Ubuntu's connectivity check and it didn't help.

OK, I figured out the problem. It turns out passthrough must be false, not :unknown. The results for me are:

  • false: Works fine.
  • :unknown: Both known and unknown DNS lookups work and are fast, but CPU goes to 100% and pings are slow due to RARP.
  • true: Unknown DNS lookups are fast, known DNS lookups are slow waiting for AAAA lookups. CPU at 100%.

It seems there's a loop between vagrant-dns and systemd-resolver. I don't know much about all of this, in my mental model there's a configured DNS server and a cache and that's it. But it turns out the resolver has a chain of helpers for each interface, and vagrant-dns inserts itself as the first of those. Without passthrough, if vagrant-dns can't resolve an address, the resolver just moves to the next option, and that works fine. But with passthrough, vagrant-dns invokes the DNS system to find the address. That's the resolver. The first helper in the chain is vagrant-dns, and so ad infinitum.

Something about this setup must be wrong. I don't know if the bad design decision was made by systemd, Ubuntu, RubyDNS, or vagrant-dns. But given that this infinite loop is a fact of life, I think vagrant-dns could stand to be more defensive. Perhaps a rate limit on the number of queries or something.

You could mark this issue as fixed because I have my workaround. But I'd much prefer if something could be done to make finding that solution more straightforward.

Thanks, ran into the same problem,VagrantDNS::Config.passthrough = false works for me