avahi / avahi

Avahi - Service Discovery for Linux using mDNS/DNS-SD -- compatible with Bonjour

Home Page:http://www.avahi.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spurious name conflicts

callegar opened this issue · comments

Hi, hope this is the right place for reporting issues with avahi-daemon as the readme on my system points to the bugtracker on freedesktop.org that does not list avahi as a bug report target.

I am experiencing spurious name conflicts on various systems, all of which have a common trait in having two interfaces, one on the local lan, having a static IP address and the other getting a dhcp address from somewhere (typically an ADSL router).

What happens is the following. Suppose that the host is called "foo". Initially, it is correctly advertised as foo.local. After some time the name conflict occurs and the host starts being advertised as foo-2.local, foo-3.local, etc., even if it is certainly the sole host named foo on the network. In practice there is a spurious name conflict with the host itself, probably due to some race in avahi. The unfortunate result is that no other system cannot find "foo" no more on the network, since they look for foo.local.

I see the issue on a couple of debian jessie systems (avahi version 0.6.31); on a raspbian jessie system (same); and on an openwrt chaos calmer system (avahi version 0.6.31 again).

I see a lot of reports for this same issue (or possibly something similar) on many distro bugtrackers, applications bugtrackers and question sites:

I wonder if there is something misconfigured on my systems (and in this case some hit at diagnosing would be appreciated) or if this is an issue (possibly a race) with the avahi daemon.

Even if this cannot be fixed rapidly, I'd like to suggest an interim point release of avahi with an option to disable the name conflict analysis when he/she is absolutely sure that it won't be needed on his/her network.

I agree that I have seen this from time to time, unfortunately I am not currently sure what causes it. I think in some cases it might be related to the reflector, but if that is not in use I am not sure.

How often is this happening? I wonder if we can setup a long term pcap capture to try and figure out what happens.

Rather frequently, I see it almost every odd day.

It seems to be associated to a lease expire on the interface getting the address from dhcp and probably has to do with the fact that there is both an IPv4 and an IPv6 address configured for the interface...

Apr 27 07:30:57 xyz dhcpcd[365]: eth0: soliciting a DHCPv6 lease
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: fe80::6a7f:74ff:fe15:6a2e router available
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: ADV fd57:81fe:80da::218/128 from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: REPLY6 received from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding address fd57:81fe:80da::218/128
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: renew in 21600 seconds, rebind in 34560 seconds
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: using IPv4LL address 169.254.139.60
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding route to 169.254.0.0/16
Apr 27 07:30:57 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv4 with address 169.254.139.60.
Apr 27 07:30:57 xyz avahi-daemon[24276]: New relevant interface eth0.IPv4 for mDNS.
Apr 27 07:30:57 xyz avahi-daemon[24276]: Registering new address record for 169.254.139.60 on eth0.IPv4.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::600c:b99e:4f17:ce61.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv6 with address fd57:81fe:80da:0:c99b:6cc1:2a7c:c139.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Registering new address record for fd57:81fe:80da:0:c99b:6cc1:2a7c:c139 on eth0.*.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::600c:b99e:4f17:ce61 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::a92:e068:3cb:7ae2 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.208.59 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 192.168.32.1 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.139.60 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for lo.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Host name conflict, retrying with xyz-2

I don't think that the reflector should be on anywhere, as it should be disabled by default, shouldn't it?

Preventing avahi-daemon from using the interface where the address is received from a dhcp server makes the issue disappear, but obviously it is not a solution.

Avahi can't handle inter-connected multi-homed systems. We have to use the option to disable one of the interfaces to avoid the daemon from seeing mutliple name registration requests (one from each network).

Best I can tell there isn't a better solution since this is really an issue with the design of the protocol.

Still I wonder...

  1. why do I not just see a -1, but also a -2 and every now and then a -3 too?
  2. wouldn't it be possible to have the two interfaces both managed by avahi-daemon with a reproducible name assignment? Like getting from the very start hostname.local for the IP on the one of the two nics and hostname-2.local for the name on the other nic, rather than having things in one way at boot and then getting the -x suffix when the dhcp lease is renewed?
    I am asking because the issue is not the -2, but not knowing in advance how an host will be reachable.

I totally had this happen on one of my own systems, with a very similar looking log to you. Downed and uped a bunch of interfaces rapidly. There must definitely be a bug there I'll have to try and figure out if I can make it reproducible.

Some kind of race to do with the new interfaces appearing while probing perhaps.. there is a related issue for services that get stuck registering. So maybe the logic for interfaces coming and going needs to be reviewed.

OK I think I figured it out. What's happening is an address is withdrawn before it finishes probing, but we receive a copy of our own probe immediately after and thus assume a conflict (our own multicast probes are mirrored back to us by the kernel). A bit of a race condition.

This happens a lot with IPv6 where we withdraw the fe80 link-local address once we receive a global address and can happen very rapidly on boot. Of note you are using IPv6 on your site, as well as mine where I am seeing this. On IPv4 address withdrawls while probing are quite uncommon.

So we'll need to identify those in some way, either with a ghost list or otherwise determining that the probe looped back. I'll look at that.

Confirmed the issue as I suspected, we withdraw our address record but only then receive a copy of our own probe and decide it is a conflict:

Jun 20 15:40:58 hyper avahi-daemon[6567]: Joining mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Registering new address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.*.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Leaving mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Withdrawing address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Received conflicting probe [hyper.local#011IN#011AAAA fe80::9cf5:4ff:fef6:ec81 ; ttl=120]. Local host lost. Withdrawing.

This happens because we revoke the link-local address from being advertised once we receive a global address.

Hope to have a fix for this shortly

commented

Would a workaround be to disable ipv6 in the config when you're not using it?

Any updates on this?

Hey @lathiat, sorry for bugging you.

I think this just happened to me as well. I have a usual IPv4/6 dual stack network at home and run avahi-daemon in a Docker container with network_mode host.

This is the log:

daemon_1  | 2018-06-21T11:54:39.577225354Z Found user 'avahi' (UID 102) and group 'avahi' (GID 102).
daemon_1  | 2018-06-21T11:54:39.577682651Z Successfully dropped root privileges.
daemon_1  | 2018-06-21T11:54:39.578293096Z avahi-daemon 0.6.32 starting up.
daemon_1  | 2018-06-21T11:54:39.579337101Z WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
daemon_1  | 2018-06-21T11:54:39.579720148Z Successfully called chroot().
daemon_1  | 2018-06-21T11:54:39.580163570Z Successfully dropped remaining capabilities.
daemon_1  | 2018-06-21T11:54:39.580505155Z Loading service file /services/smbd.service.
daemon_1  | 2018-06-21T11:54:39.583626444Z Joining mDNS multicast group on interface enp5s0.IPv6 with address 2003:e5:d70e:bc00:265e:beff:fe06:ed43.
daemon_1  | 2018-06-21T11:54:39.583822517Z New relevant interface enp5s0.IPv6 for mDNS.
daemon_1  | 2018-06-21T11:54:39.583868742Z Joining mDNS multicast group on interface enp5s0.IPv4 with address 192.168.178.58.
daemon_1  | 2018-06-21T11:54:39.583883829Z New relevant interface enp5s0.IPv4 for mDNS.
daemon_1  | 2018-06-21T11:54:39.584342326Z Network interface enumeration completed.
daemon_1  | 2018-06-21T11:54:39.585046921Z Registering new address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-21T11:54:39.585065258Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-21T11:54:40.492960211Z Server startup complete. Host name is nibelungenhort.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-21T11:54:41.400508244Z Service "nibelungenhort" (/services/smbd.service) successfully established.
daemon_1  | 2018-06-22T02:48:49.975073115Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:49.984858578Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:49.984962940Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983388121Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:50.983547607Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983631244Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269257637Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269418273Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269500173Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269576160Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.686728887Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.690760136Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690858948Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690937072Z Withdrawing address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.691012722Z Withdrawing address record for 192.168.178.58 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.696120164Z Host name conflict, retrying with nibelungenhort-2
daemon_1  | 2018-06-22T02:48:52.697784540Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.697876765Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-22T02:48:54.587279952Z Server startup complete. Host name is nibelungenhort-2.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-22T02:48:55.489180837Z Service "nibelungenhort-2" (/services/smbd.service) successfully established.

two interfaces
dual stack

Make sure you are not hitting a shortcoming in the protocol involving the daemon seeing other daemons through multiple paths. System announces itself through one interface and gets rejected after announcing itself on subsequent interfaces due to being a duplicate.

I always use allow-interfaces/deny-interfaces to force avahi to use only a single interface (in my industry this is typically the management interface). After that I have not had this issue.

There are two interfaces on my system, although only one is actually up and connected to the network. As far as I can see avahi-daemon only works on the connected interface (enp5s0), but I'll try manually allowing it.

is anyone working on a fix for this? @lathiat said

Hope to have a fix for this shortly

exactly a year ago? Any progess? Thanks

allow-interfaces will work around the issue as it's a bug in handling interfaces rapidly adding and removing addresses (particularly noticeable if you have globally routable IPv6 addresses, as we add then remove the link local address)

Still planning a fix

So, this is still happening to me. I've set allow-interfaces=enp5s0 in avahi-daemon.conf as suggested here, but that didn't help. Still the same log messages as posted above.

I tried the allow-interfaces method as well but it's not working. Is there another work-around for this? How about a fix?

@lathiat Any ETA? Many distros are reporting the same bug.

The only workaround for me is a daily restart of avahi-daemon. I'll soon replace this by an automatic restart if the daemon logs the error message, but for now this works for me. Not ideal, but eh… it's just for my homelab and nothing critical.

seems like a working work around is

cache-entries-max=0

Maybe there just needs to be a configuration option to completely disable conflict checking. Or at least to prevent modification of the host name if a conflict is discovered. I set the host names I want manually. I expect avahi to faithfully announce the hostname that I have configured, not to change it sporadically. The whole point of avahi is so I can connect based on the host name alone. If avahi is changing the host name for any reason whatsoever, this defeats the whole purpose of using avahi.

@gramels that workaround might prevent avahi from discovering a phantom conflict and changing the host name, but it also prevents avahi lookups from working. I cannot perform any lookups via avahi with cache-entries-max set to zero.

This is indeed awful. Are there any plans to fork this project due to the lack of maintenance?

Unfortunately the host-name conflict detection is part of the Multicast DNS spec so any such option would both violate it and just not work well. If another host on the network is actually advertising your hostname trying to connect to it with that hostname will unreliably connect to your machine anyway.

Obviously in this case the bug is such that there is not a real conflict. I did identify the cause for this, I will try and get a fix patched shortly.

There should still be an option to disable hostname mangling on specific devices. Obviously this would not default to on, but it would prevent devices that are supposed to be accessible with a specific hostname from getting permanently 'bumped'. Perhaps the way to do this would be to continuously retry until the correct name can be announced. Another option would be to use the 'bumped' name, but periodically retry (say, every 60 seconds) the correct name so that if a device does get 'bumped', it will eventually return to its correct name. Again, this can be off by default, but available for devices that are supposed to be accessible with a specific name.

Unplugging ethernet cable from my linux pc triggers this issue. It is Ubuntu 18.04 system with dual ipv4/ipv6 network stack.

Any update on this? This bug is making my life miserable at the moment.

Turning off IPV6 works in my case, CentOS 7.4: IPV6INIT=no
use-ipv6 is disabled by default in /etc/avahi/avahi-daemon.conf. Just keep allow-interfaces commented.
I'm using avahi-daemon 0.6.31.

I also bumped into this issue randomly out of the blue on my Raspberry Pi running Raspbian Stretch Lite.
Is it because I have both the ethernet and the wireless connected to the same network?

Mar 24 18:24:27 raspberrypi systemd[1]: Starting Avahi mDNS/DNS-SD Stack...
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Found user 'avahi' (UID 108) and group 'avahi' (GID 112).
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Successfully dropped root privileges.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: avahi-daemon 0.6.32 starting up.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Successfully called chroot().
Mar 24 18:24:27 raspberrypi systemd[1]: Started Avahi mDNS/DNS-SD Stack.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Successfully dropped remaining capabilities.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: No service file found in /etc/avahi/services.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Network interface enumeration completed.
Mar 24 18:24:27 raspberrypi avahi-daemon[320]: Server startup complete. Host name is raspberrypi.local. Local service cookie is 3747010316.
Mar 24 18:24:30 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::8632:f18a:a80e:f8ec.
Mar 24 18:24:30 raspberrypi avahi-daemon[320]: New relevant interface wlan0.IPv6 for mDNS.
Mar 24 18:24:30 raspberrypi avahi-daemon[320]: Registering new address record for fe80::8632:f18a:a80e:f8ec on wlan0.*.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::b10d:1e75:f35a:160d.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: New relevant interface eth0.IPv6 for mDNS.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Registering new address record for fe80::b10d:1e75:f35a:160d on eth0.*.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fe80::8632:f18a:a80e:f8ec.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface wlan0.IPv6 with address fdaa:bbcc:ddee:0:6928:c176:9743:26f7.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Registering new address record for fdaa:bbcc:ddee:0:6928:c176:9743:26f7 on wlan0.*.
Mar 24 18:24:31 raspberrypi avahi-daemon[320]: Withdrawing address record for fe80::8632:f18a:a80e:f8ec on wlan0.
Mar 24 18:24:32 raspberrypi avahi-daemon[320]: Registering new address record for 2a00:23c4:3d97:bb00:9e0b:1b47:bae5:68fd on wlan0.*.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.64.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: New relevant interface eth0.IPv4 for mDNS.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for 192.168.1.64 on eth0.IPv4.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::b10d:1e75:f35a:160d.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface eth0.IPv6 with address 2a00:23c4:3d97:bb00:89a2:66ce:7d9f:95f4.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for 2a00:23c4:3d97:bb00:89a2:66ce:7d9f:95f4 on eth0.*.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Withdrawing address record for fe80::b10d:1e75:f35a:160d on eth0.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Withdrawing address record for 2a00:23c4:3d97:bb00:9e0b:1b47:bae5:68fd on wlan0.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Withdrawing address record for fdaa:bbcc:ddee:0:6928:c176:9743:26f7 on wlan0.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Withdrawing address record for 192.168.1.64 on eth0.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Host name conflict, retrying with raspberrypi-2
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for 2a00:23c4:3d97:bb00:9e0b:1b47:bae5:68fd on wlan0.*.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for fdaa:bbcc:ddee:0:6928:c176:9743:26f7 on wlan0.*.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for 2a00:23c4:3d97:bb00:89a2:66ce:7d9f:95f4 on eth0.*.
Mar 24 18:24:35 raspberrypi avahi-daemon[320]: Registering new address record for 192.168.1.64 on eth0.IPv4.
Mar 24 18:24:36 raspberrypi avahi-daemon[320]: Registering new address record for fdaa:bbcc:ddee:0:e997:4a66:a53b:1eae on eth0.*.
Mar 24 18:24:37 raspberrypi avahi-daemon[320]: Server startup complete. Host name is raspberrypi-2.local. Local service cookie is 3747010316.
Mar 24 18:24:39 raspberrypi avahi-daemon[320]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.73.
Mar 24 18:24:39 raspberrypi avahi-daemon[320]: New relevant interface wlan0.IPv4 for mDNS.
Mar 24 18:24:39 raspberrypi avahi-daemon[320]: Registering new address record for 192.168.1.73 on wlan0.IPv4.

I am reproducing it sometimes when there is an avahi reflector on the same network.

I tried all solutions explained here (disable ipv6, cache entries to 0, restrict interfaces), with no success so far.

It looks like the reflector reflects back the original publication. Is that intended?

I think a temporary solution would be to continuously try to advertise the configured host name after a conflict, reverting as soon as the name is available, with the option to not advertise a name at all until the configured host name is available.

@alexforencich It's a nice idea.

I have explored the situation when using reflectors.

The real fix would be to avoid that a machine reacts to its own probes. But I'm not sure how they could be identified. In particular, when the machine has several IP addresses (the case where it has both IPv4 and IPv6 addresses being one of these situations), it's probably hard to detect this condition.

An easy but violent fix would be to offer an option to simply disable duplicate name detection, warning that this strategy does not comply with the RFC.

I have two VLAN (wired network in first vlan and Wi-Fi in other vlan)
I connect raspberry pi in rj45 (first VLAN), and connect to Wifi ( i have with wifi module in raspberry pi) and in the end i have device with connected two interfaces (VLANs)
install avahi-daemon and in settings i setup reflector.
i do this for printing to printer from wi-fi network to wired and for work Apple TV. (i want to see Apple TV on both networks)
I have a problems with naming (Apple TV(1), Apple TV(2))

Tell me please, what of this may be really close the trouble with using Apple TV or something (printers etc)

  1. Disable IPv6
  2. Explicitly specify the interfaces
  3. Disable cache
  4. Static IP (not use DHCP on both interfaces)

Or is there really no solution?

I'm not a computer major. In my case, I solved to disable network.service. Because NetworkManager and network are both able. Sometimes, both services are conflicted. (Maybe always)

For what it's worth, I'm also seeing this:

pi@octopi:~ $ sudo service avahi-daemon status
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
   Loaded: loaded (/lib/systemd/system/avahi-daemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-11-11 13:16:56 PST; 8min ago
 Main PID: 357 (avahi-daemon)
   Status: "avahi-daemon 0.7 starting up."
    Tasks: 2 (limit: 4915)
   Memory: 1.2M
   CGroup: /system.slice/avahi-daemon.service
           ├─357 avahi-daemon: running [octopi-2.local]
           └─377 avahi-daemon: chroot helper

octopi avahi-daemon[357]: Joining mDNS multicast group on interface wlan0.IPv6 with address 2601:648:8701:18c0:456b:7f8b:945f:e1b0.
octopi avahi-daemon[357]: Registering new address record for 2601:648:8701:18c0:456b:7f8b:945f:e1b0 on wlan0.*.
octopi avahi-daemon[357]: Withdrawing address record for fe80::bf6c:78cf:e31f:ce8a on wlan0.
octopi avahi-daemon[357]: Host name conflict, retrying with octopi-2
octopi avahi-daemon[357]: Registering new address record for 2601:648:8701:18c0:456b:7f8b:945f:e1b0 on wlan0.*.
octopi avahi-daemon[357]: Registering new address record for 2601:648:8701:18c0::2 on wlan0.*.
octopi avahi-daemon[357]: Server startup complete. Host name is octopi-2.local. Local service cookie is 1983257684.
octopi avahi-daemon[357]: Joining mDNS multicast group on interface wlan0.IPv4 with address 10.20.30.53.
octopi avahi-daemon[357]: New relevant interface wlan0.IPv4 for mDNS.
octopi avahi-daemon[357]: Registering new address record for 10.20.30.53 on wlan0.IPv4.

Workflow

My typical workflow involves testing the OctoPi image against a Raspberry Pi 3B, a Rapsberry Pi 4B, a Raspberry Pi 3B+, etc, etc. So each different Pi has the same hostname but only one of these is running at a time. For the purpose of testing, the hostname is preferably the same.

I typically run a script nukeop that's in my path:

#!/bin/sh

sed -i '' '/octopi\.local/d' ~/.ssh/known_hosts

This allows ssh to behave and is faster than manually editing that.

The DHCP server is not issuing a static IP address for any particular MAC address, for what it's worth.

Result

And yet, the interaction between the avahi-daemon and the DHCP server results in unexpected name broadcasting. This behavior of decorating the hostname isn't really useful to me and I would guess many users.

Hello, I can confirm that, as of January 2020 this bug still happens. In short:

jan 29 09:58:26 dragonmount avahi-daemon[517]: Host name conflict, retrying with dragonmount-2

It happens after replacing the local scope IPv6 address with the global scope IPv6 address (actually, something undesired for me, I wanted the local domain resolved to the local scope).

This is causing problems in my local network and I'd like to see a configuration for force the host name for my server (doesn't matter what the mDNS standard says, the server needs to have a fixed host name, that's obvious).

Since I've just realized that this bug is 3 and half years old, I'm now considering giving up on avahi.

@lathiat I have the same problem.
In Mac OS the connection is not reachable via Finder (but is via "go to")
I use the discovery tool on a mac and figured that the avahi-daemon does not publish any link local addresses [fe80:::] is this related to the host name conflict?

Hello, I can confirm that, as of January 2020 this bug still happens. In short:

jan 29 09:58:26 dragonmount avahi-daemon[517]: Host name conflict, retrying with dragonmount-2

It happens after replacing the local scope IPv6 address with the global scope IPv6 address (actually, something undesired for me, I wanted the local domain resolved to the local scope).

This is causing problems in my local network and I'd like to see a configuration for force the host name for my server (doesn't matter what the mDNS standard says, the server needs to have a fixed host name, that's obvious).

Since I've just realized that this bug is 3 and half years old, I'm now considering giving up on avahi.

I guess this is a good hint. Why not keeping the local scope IPv6 addresses and add the global scope?
This would make avahi-daemon behave like Mac OS.
Why not giving this a try?

Can't we just ignore all conflicts that involve IP addresses we own?

I started working on it, and there's one situation I am reproducing 100%. With a reflector somewhere else on the network, I do

ip link set down dev eth0; ip link set up dev eth0

After the interface is up again, eth0 is registered with its link-local address fe80:.... Another machine, the one with the reflector, then tells it about its other global address 2620:..., and this triggers a name change.

I'm not saying this is describing all the cases happening to the various people, but this is certainly at least one of the issues.

I believe this problem is caused by the fact that we send out a probe, and then immediately remove the address from Avahi and then forget about it. When we get the copy of our probe some very short time later (within a few milliseconds) we already forgot about it and so we conflict.

We can't just ignore probes sent from ourselves because otherwise we will ignore conflicts with another mDNS stack on the same host which can and does happen when applications have their own full implementation.

The fix for this is that currently we expire dead entries at random intervals with no respect to how long the entry was in the DEAD state. I've made a basic modification to ensure records stay DEAD for at least 1 second which should help prevent this in the case I was able to reproduce - as the packets loopback pretty quick just obviously too quick for the before case where it happened somewhat randomly including every time we received a packet.

Unfortunately I upgraded my machine from 18.04 to 20.04 devel and for whatever reason the way the network interfaces come up now I can no longer reproduce the issue reliably.

For the case where this happens without a reflector, I've pushed a possible fix to this branch:
https://github.com/lathiat/avahi/tree/117-spurious-name-conflicts
0a536f6

Does anyone have an easy reproducer for this to test if the patch works? And whether it works or not, can you include the output of running with --debug (as I added some extra debug msgs)

For the case @Bischoff is refering to with a reflector involved I'm not entirely sure this will solve that case. I will need to understand the setup for that better.

I believe this problem is caused by the fact that we send out a probe, and then immediately remove the address from Avahi and then forget about it. When we get the copy of our probe some very short time later (within a few milliseconds) we already forgot about it and so we conflict.

Not exactly, but close.

Here is what I was able to trace: after bringing up the interface again, we probe for our name. Then the reflector answers with an IP that matches our name (the 2620 one), but that we already forgot about.

So it's not a copy of our probe, and it's not within milliseconds.

A word of caution: what you describe might happen. It's just not what I'm seeing in my test bench these days.

We can't just ignore probes sent from ourselves because otherwise we will ignore conflicts with another
mDNS stack on the same host which can and does happen when applications have their own full
implementation.

In my case, it's not a probe sent from ourselves, but the answer from the reflector to our query. I'm saying this because can see arriving it from handle_response_packet routine.

After bringing the interface up, we query for ebi2-minion.tf.local in the probing phase, and get this anwer:

 ebi2-minion.tf.local       IN      AAAA 2620:...:a5bc ; ttl=120

while our only resource record so far is for fe80:.... Hence the conflict and renaming.

For the case @Bischoff is refering to with a reflector involved I'm not entirely sure this will solve that
case. I will need to understand the setup for that better.

It indeed sounds slightly different. I'm also a bit concerned we might be describing different problems under same issue.

The problem (at least in my case) comes from reflect_cache_walk_callback() on the reflector that sends us back our own information.

That function takes care of sending data only to other interfaces. But bad luck, the information is cached on all interfaces :-( .

Found a fix to prevent caching on other interfaces.

#263

Good news is that this fix was merged into avahi source code. Bad news is that it might not fix everyone's problems.

If someone comes up with an easy reproducer of their own problem, I would be happy to give a hand.

I first thought the zeroconf = yes in netatalk would cause trouble. But it looks still to me that after dropping the link local address the connection is gone. The Mac OS Finder cannot find AFP volumes anymore. Funnily enough Samba from the same machine is still working.

I will note that the Pi Hole has become very popular. People are installing it on their networks and even on the same Pi computer which has your own software on it. And of course, they rarely mention this when they come to you for customer support. Since it includes a DNS server (whose job it is to be authoritative), I will just note this to the issue in case others haven't thought about it.

commented

Any progress? ... Any update? ...

Please "vote"/thumb up this issue if you're affected. Right now it looks like only 8 people experience this.

The fix for this is that currently we expire dead entries at random intervals with no respect to how long the entry was in the DEAD state. I've made a basic modification to ensure records stay DEAD for at least 1 second which should help prevent this in the case I was able to reproduce

Unfortunately I upgraded my machine from 18.04 to 20.04 devel and for whatever reason the way the network interfaces come up now I can no longer reproduce the issue reliably.

I don't understand why you need to reproduce again before making the DEAD state last for longer, 1s seems like a very short DEAD time already. Does the specification say anything about how short death should be?

I reboot a lot for testing and I'm reproducing this amnesia issue once every couple days with avahi version 0.7-4ubuntu7 on Ubuntu 20.04. They are all caused by instant withdrawal+re-registration and the withdrawals are not just for the link-local fe80::, see example below.

I'm using systemd-networkd instead of NetworkManager (for unrelated reasons) and two IPv6 prefixes and that seems to confuse avahi-daemon and cause a lot of instant withdrawal+re-registration. (Un?)fortunately few of those events cause a conflict.

So maybe there's another withdrawal bug hidden behind this amnesia bug but either way a longer DEAD state can't hurt, can it?

Apr 28 19:00:11 myhost systemd[650]: Started Sound Service.
Apr 28 19:00:11 myhost avahi-daemon[494]: Joining mDNS multicast group on interface enp3s0.IPv6 with address fe80::...
Apr 28 19:00:11 myhost avahi-daemon[494]: New relevant interface enp3s0.IPv6 for mDNS.
Apr 28 19:00:11 myhost systemd-networkd[335]: enp3s0: Gained IPv6LL
Apr 28 19:00:11 myhost avahi-daemon[494]: Registering new address record for fe80::... on enp3s0.*.
Apr 28 19:00:11 myhost systemd-timesyncd[480]: Network configuration changed, trying to establish connection.
Apr 28 19:00:12 myhost systemd-networkd[335]: enp3s0: DHCPv4 address 192.168.1.161/24 via 192.168.1.1
Apr 28 19:00:12 myhost dbus-daemon[497]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.h>
Apr 28 19:00:12 myhost systemd-timesyncd[480]: Network configuration changed, trying to establish connection.
Apr 28 19:00:12 myhost systemd-timesyncd[480]: Network configuration changed, trying to establish connection.
Apr 28 19:00:12 myhost avahi-daemon[494]: Joining mDNS multicast group on interface enp3s0.IPv4 with address 192.168.1.161.
Apr 28 19:00:12 myhost systemd-timesyncd[480]: Network configuration changed, trying to establish connection.
Apr 28 19:00:12 myhost avahi-daemon[494]: New relevant interface enp3s0.IPv4 for mDNS.
Apr 28 19:00:12 myhost systemd-networkd-wait-online[472]: managing: enp3s0
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for 192.168.1.161 on enp3s0.IPv4.
Apr 28 19:00:12 myhost systemd[1]: Starting Hostname Service...
Apr 28 19:00:12 myhost systemd[1]: Finished Wait for Network to be Configured.
Apr 28 19:00:12 myhost systemd[1]: Reached target Network is Online.
Apr 28 19:00:12 myhost systemd[1]: Starting Tool to automatically collect and submit kernel crash signatures...
Apr 28 19:00:12 myhost systemd[1]: Started crash report submission daemon.
Apr 28 19:00:12 myhost systemd[1]: kerneloops.service: Found left-over process 893 (kerneloops) in control group while starting unit. Ignoring.
Apr 28 19:00:12 myhost systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 28 19:00:12 myhost systemd[1]: Started Tool to automatically collect and submit kernel crash signatures.
Apr 28 19:00:12 myhost systemd[1]: Reached target Multi-User System.
Apr 28 19:00:12 myhost systemd[1]: Reached target Graphical Interface.
Apr 28 19:00:12 myhost systemd[1]: Started Stop ureadahead data collection 45s after completed startup.
Apr 28 19:00:12 myhost systemd[1]: Starting Update UTMP about System Runlevel Changes...
Apr 28 19:00:12 myhost whoopsie[892]: [19:00:12] Using lock path: /var/lock/whoopsie/lock
Apr 28 19:00:12 myhost systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Apr 28 19:00:12 myhost systemd[1]: Finished Update UTMP about System Runlevel Changes.
Apr 28 19:00:12 myhost systemd-timesyncd[480]: Network configuration changed, trying to establish connection.
Apr 28 19:00:12 myhost avahi-daemon[494]: Leaving mDNS multicast group on interface enp3s0.IPv6 with address fe80::...
Apr 28 19:00:12 myhost avahi-daemon[494]: Joining mDNS multicast group on interface enp3s0.IPv6 with address 2601::...
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for 2601::... on enp3s0.*.
Apr 28 19:00:12 myhost avahi-daemon[494]: Withdrawing address record for fe80::... on enp3s0.
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for fd54::... on enp3s0.*.
Apr 28 19:00:12 myhost avahi-daemon[494]: Withdrawing address record for 2601::... on enp3s0.
Apr 28 19:00:12 myhost avahi-daemon[494]: Withdrawing address record for 192.168.1.161 on enp3s0.
Apr 28 19:00:12 myhost avahi-daemon[494]: Withdrawing address record for ::1 on lo.
Apr 28 19:00:12 myhost avahi-daemon[494]: Withdrawing address record for 127.0.0.1 on lo.
Apr 28 19:00:12 myhost avahi-daemon[494]: Host name conflict, retrying with myhost-2
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for fd54::... on enp3s0.*.
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for 2601::... on enp3s0.*.
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for 192.168.1.161 on enp3s0.IPv4.
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for ::1 on lo.*.
Apr 28 19:00:12 myhost avahi-daemon[494]: Registering new address record for 127.0.0.1 on lo.IPv4.
Apr 28 19:00:12 myhost whoopsie[892]: [19:00:12] Could not get the Network Manager state:
Apr 28 19:00:12 myhost whoopsie[892]: [19:00:12] GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.NetworkManager>
Apr 28 19:00:12 myhost whoopsie[892]: [19:00:12] offline

Any progress? ... Any update? ...

Please "vote"/thumb up this issue if you're affected. Right now it looks like only 8 people experience this.

Raspberry Pi running Buster.
Experiencing -2.local here too.

commented

I tried turning ULA addresses on and off in OpenWRT over a few days and looking at journalctl it seems to have made zero difference to Ubuntu 20.04.

Every time the conflict happens, it happens immediately after leaving the link local address mDNS group and joining the public IPv6 mDNS group. Not clear why avahi-daemon withdraws and then registers all other IPv4 and IPv6 addresses at the same time.

I have been seeing this on one of my Linux Mint 18.2 boxes for a while now. I thought it was something I was fat-fingering and ignored it. I decided today to try an get it sorted out. Running "avahi-daemon 0.6.32-rc" from what the status message says.

I just changed the hostname on this box from "reliant3" to "reliant" and restarted avahi-daemon. The problem seems to have gone away for now. Is there something special about a hosname ending in a numeric character?

commented

Restarting helps with every rare race condition.

for whatever reason the way the network interfaces come up now I can no longer reproduce the issue reliably.

Before fixing them, the easiest ways to reproduce race conditions is to temporarily make them worse. In this case add an artificial delay between when probes are received and when they're processed.

commented

Does anyone have an easy reproducer for this to test if the patch works? And whether it works or not, can you include the output of running with --debug (as I added some extra debug msgs)

Thank you for working on this! I've added a version of avahi with your patch applied to my experimental PPA to make it easier for others on Ubuntu 20.04 to test your patch too. If anyone does test from there, then you can generate debug logs by creating /etc/systemd/system/avahi-daemon.service.d/override.conf with the following:

[Service]
ExecStart=
ExecStart=/usr/sbin/avahi-daemon -s --debug

and then run sudo systemctl daemon-reload and then sudo systemctl reload avahi-daemon.service. Then journalctl -u avahi-daemon will show you the logs.

I can reproduce the problem on Ubuntu 20.04. The easiest way to check was to run journalctl -u avahi-daemon|grep conflict. As I know that on my home network there shouldn't be any conflict, if that shows one as logged, I know it is a bug.

I use my laptop pretty much every day, going to standby and back pretty much every day. The last two conflicts were on 25 May and on 13 June.

The other day I turned on --debug and looked again today to find that second conflict on 13 June and the additional debugging suggests to me that it is the same problem you addressed in your patch.

I'll now run my patched version for a few days and see if the problem is now resolved.

I've got the same problem. I run a debian based server for samba and use that to connect from MAC OS Samba. I use avahi to support Bonjour to connect to the server without setting up an DHCP server. The IPv4 address is fix defined at the server. The server don't use DHCP. In parallel the server got an IPv6 address. Until now i have absolut no knowledge how IPv6 really work....
It seems that may provider or my router or who ever change the IPv6 Adress every night. It seems it is a result of the nightly disconnect.

When i start or restart avahi everything is fine. But after the server receive a new IPv6 address, avahi stops to announce the correct server name.

......
Jun 14 01:25:28 god-server avahi-daemon[26058]: Registering new address record for fe80::225:90ff:fe02:16b2 on enp2s0..
Jun 14 01:25:28 god-server avahi-daemon[26058]: Withdrawing address record for fe80::225:90ff:fe02:16b2 on enp2s0.
Jun 14 01:25:28 god-server avahi-daemon[26058]: Registering new address record for fe80::225:90ff:fe02:16b2 on enp2s0.
.
Jun 14 01:25:29 god-server avahi-daemon[26058]: Withdrawing address record for fe80::225:90ff:fe02:16b2 on enp2s0.
Jun 14 01:25:29 god-server avahi-daemon[26058]: Registering new address record for fd00::225:90ff:fe02:16b2 on enp2s0..
Jun 14 01:25:29 god-server avahi-daemon[26058]: Registering new address record for fe80::225:90ff:fe02:16b2 on enp2s0.
.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Registering new address record for 2001:16b8:c1ac:9700:225:90ff:fe02:16b2 on enp2s0..
Jun 14 01:25:30 god-server avahi-daemon[26058]: Withdrawing address record for fd00::225:90ff:fe02:16b2 on enp2s0.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Withdrawing address record for fe80::225:90ff:fe02:16b2 on enp2s0.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Withdrawing address record for 2001:16b8:c18e:c800:225:90ff:fe02:16b2 on enp2s0.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Received conflicting probe [god-server.local#011IN#011AAAA fe80::225:90ff:fe02:16b2 ; ttl=120]. Local host lost. Withdrawing.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Withdrawing address record for 192.168.178.2 on enp2s0.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Host name conflict, retrying with god-server-2
Jun 14 01:25:30 god-server avahi-daemon[26058]: Registering new address record for 2001:16b8:c1ac:9700:225:90ff:fe02:16b2 on enp2s0.
.
Jun 14 01:25:30 god-server avahi-daemon[26058]: Registering new address record for 192.168.178.2 on enp2s0.IPv4.
Jun 14 01:25:32 god-server avahi-daemon[26058]: Server startup complete. Host name is god-server-2.local. Local service cookie is 885839576.
Jun 14 01:25:32 god-server avahi-daemon[26058]: dbus-protocol.c: interface=org.freedesktop.Avahi.Server, path=/, member=EntryGroupNew
Jun 14 01:25:32 god-server avahi-daemon[26058]: dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client0/EntryGroup2, member=GetState
Jun 14 01:25:32 god-server avahi-daemon[26058]: dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client0/EntryGroup2, member=AddService
Jun 14 01:25:32 god-server avahi-daemon[26058]: dbus-util.c: Responding error 'Local name collision' (-8)
Jun 14 01:25:32 god-server avahi-daemon[26058]: dbus-entry-group.c: interface=org.freedesktop.Avahi.EntryGroup, path=/Client0/EntryGroup2, member=Free
.....

This behavior occurred when I created the lxd bridge (lxd init) -- avahi-daemon sees both interfaces, adopts the new bridge, relays the old hostname.local out to the bridge interface, sees the conflict, and de-registers then re-registers the ethernet interface with hostname-2.

Restarting the service resolves the problem. If it is a non deterministic race condition could occur in future, I suppose.

Still, the action of adding the bridge while avahi-daemon running may provide nice opportunity to capture traffic and analyze the conflict.

commented

I came here looking for solutions to this issue, and I see I'm not the only one. What happens to me seems to fit what others have described. Avahi withdraws records, then registers new ones, and a conflict occurs. This is kind of a problem because I want to use the hostname to access a server, and I'd rather not have to manage it all the time.

commented

Perhaps the behaviour could be configurable. If a conflict arises, the daemon could be set to rename (current behaviour), restart or ignore (@EternityForest 's suggestion).

I think what it should do when there is a conflict is pick a new name and use that, but then periodically check to see if the configured name is available. If so, switch back to the original name. This could be done with an exponential backoff or similar so that it can recover quickly if it's a fluke.

commented

Not a bad idea, but I wonder what the point is of picking a new name in the first place. It makes the host unavailable to anyone using the expected name. How would other hosts discover the new name?

Well, I think the idea is if you connect 10 devices that have the same default name configured, you can actually access all of them with unique names. mdns devices broadcast their names periodically, so any device connected to the network can simply listen to see what devices are present.

commented

If that is the behaviour the user wants, then yes. But if the user wants the host to be available under that name, so that other things can rely on it, then you don't want the renaming. Which is why I suggested making it configurable, so that it can behave according to the user's needs.

As a user of avahi / SMB i can state, that i never never never want to have a new name. My problem was that all my SMB connections get lost due to the change of the name of server which is announced by avahi an SMB.

Well, the problem is the spurious name conflicts. You could also implement a timeout. If there is a conflict for more than, say, 5 minutes, keep the new name. Otherwise, try to stick with the configured name. Ideally, the temporary renaming would only be visible for a few seconds maximum while the device connects to the network. After that, it would either use the original name or the new one.

@seniorgod agreed, there needs to be an option to simply retry the configured name continuously - say, every 10 seconds or something - until it can successfully register it without ever attempting a rename.

For me the problem disappeared after installing on the Mac the latest update (10.15.6).

commented

I don't think any of your elaborate suggestions are required here. The problem is simple. The protocol requires a fallback to a different name if a conflict is detected. If you want your names never to change, then it's your job to ensure that they don't conflict, and in addition we need to fix the broken conflict detection in Avahi. Then your problem will be solved.

@lathiat proposed an experimental patch to address this in this comment. If you're affected, I'd test that patch for him. It seems to work for me.

@basak yes, if a conflict is detected. What's wrong with re-trying a few times to make sure it's actually a conflict and not a false positive? If it retried even once after a couple of seconds, then a slightly lacking conflict detection routine would not be a problem at all. The problem now is that avahi is rather reliably detecting a conflict that doesn't exist.

DNS Service Discovery was intended for applications to discover services, not names. If you need a service that does foo, you search for foo._tcp.local or foo._udp.local. depending on how the service is defined. SRV records contain priority and weight and target that is an instance name. You shouldn't really care if there's a name conflict because you shouldn't be searching directly for the name. You should search for the service and resolve the target instance name based on priority and weight to something you can connect to. This whole thread seems to miss this point that the name can be dynamic and you shouldn't really care.

Also, once you discover the instance name, you can cache it for the TTL of the resource record but shouldn't cache it longer than that. You should search for the service again to make sure there's not a better priority and/or be able to load balance across different weights with the same priority. On macOS Terminal, you can see this in action with the Shell -> New Remote Connection menu bar item. It will search for any host advertising the SSH service and display their instance names. Those names are only valid for the TTL of the published records.

@pusateri - I get that. Except my services are TCP based on a server with a name. What's happening is that avahi is telling me my MQTT broker is on a server named "mqtt-2.local". But it's not. There's no box on my network with that name. The service is, and always has been running on "mqtt.local".

I can't explain why avahi had decided that the name of my server isn't what it is.
AFAIK - avahi doesn't rename the IP address (10.0.7.1-2).
Why choose to rename a server that already has a name?

It sure looks like a bug/race/cache issue.

It could indeed be a bug, but the fix isn't to turn off name conflicts.

What instance name does it return when you browse for the mqtt service? try typing avahi-browse _mqtt._tcp

Of course, if your MQTT software doesn't support service discovery then you'll have to add that to your avahi config. If you have SSH enabled, you could also search for that: avahi-browse _ssh._tcp

When it's working - it returns "mqtt.local" and "10.0.7.11".
Sometimes after the mqtt.local box reboots, avahi returns "mqtt-2.local" and "10.0.7.11".

My other boxes know to ask avahi "Where is the mqtt service running?"
When avahi says "it's on box 'mqtt.local'" all is good.

When avahi says "it's on box 'mqtt-2.local'" then things break.

I have a weird script that tells systemd to stop/restart avahi when it sees a "-2" anything.
So far, that hack has kept my systems up and running. But when I get back into the code, I'll probably switch over to using the IP addresses. They seem to stay correct.

I agree it's not to turn off name conflicts.
It's just the logic behind name conflict resolution seems puzzling.

Cool. Every device in my network running avahi also runs SSH. Generally, I want to connect to a specific device, and not just some random device that supports SSH.

commented

That is where I ran into the problem as well. I tried to SSH into the machine, and it wasn't there. I thought it had crashed, but nope, still running. It's weird when you don't expect it.

When it happens, try logging into "server-2.local". If that works, you know it's time to restart the daemon.

@pusateri the fix is to fix conflict detection. Currently, the implementation is prone to false positives. There are a couple of ways to fix that. One is to try to get all of the corner cases. Another is to simply retry a couple of times to make sure there actually is a conflict, just in case you missed a corner case. This would make conflict detection much more reliable. Once you have the ability to retry, then you can add an option to continuously retry until the configured name becomes available, and you can set this option to be disabled by default.

@alexforencich Yes, bugs happen. But there could be other reasons like loops, not so smart wifi access points trying to replicate services across networks, etc. that cause duplicate names. It's far better to have your client handle a dynamic name than to rely on the name you think it should have if everything is working as expected. Resolve the service and use the name that is discovered. In the case of SSH, you know the prefix you're looking for. The SSH client should do this if it doesn't currently.

Yeah, the retry should solve problems like loops and make sure that avahi converges on the original name and only falls back on the mangled name if there is actually a persistent conflict.

@pconroy328 Is the name bad because it has a -2 in it or because the -2 doesn't resolve? If the -2 resolves, then you shouldn't care that it has -2 in it. If it doesn't resolve, that's a different bug.

@pconroy328 Is the name bad because it has a -2 in it or because the -2 doesn't resolve? If the -2 resolves, then you shouldn't care that it has -2 in it. If it doesn't resolve, that's a different bug.

Honestly it's been months since I played with it. I realized as I was typing that I really needed to go back and make sure that what I was asserting was actually happening.

You are correct - the "-2" will resolve! The box that used to be called "mqtt.local" will now be on the network as "mqtt-2.local".

For services that I code, I can see a way to work around a dynamic name. But, one instance I recall was setting up ntp/chrony configs so that other servers new where my GPS enabled NTP server could be found: a box called "gps.local".

When that gps.local box reappears on the network as "gps-2.local" things got weird.
Perhaps that's just me not understanding the limitations of mDNS.
Maybe I should never put a ".local" hostname in a config file.

I thought it was a safe approach - since there's only one host on the network called "mqtt", only one host on the network called "gps", only one host called "odb2".

From my naive point of view - avahi is seeing a duplicate server that's not there. And creating a -2 suffix that's not necessary.

Also, if you're not supposed to rely on the name, and there is no conflict or the conflict goes away, then what's the problem with automatically switching back to the original name? Seems like there should be no problem with that at all.

@pusateri also, I think a lot of people are not connecting to devices with some sort of a smart service or application, they're probably typing something like "ssh my_computer.local" or "rsync something my_computer.local:~" which triggers a simple mdns lookup and then it can't find the box because avahi mangled the name. However, this is confusing as there is only one device on the network called my_computer, so why is avahi changing the name? Put it another way, if there is a at least one device called my_computer, then my_computer.local should ALWAYS resolve. If there are two devices, then one of them would resolve at my_computer.local, and the other one would get mangled. But if there is only one device, it should use always used the configured hostname. Currently, avahi is violating https://en.wikipedia.org/wiki/Principle_of_least_astonishment .

Personally, this issue has been going on for so long and has caused so much annoyance that I add all of the machines I need to SSH in to to my ssh config with their static zerotier IP addresses. Otherwise, it seems like there is like a 50/50 chance that avahi will incorrectly mangle the name. And there are no conflicts here, every device has a unique name, the problem is avahi's conflict detection is utterly broken. It just needs to try a little bit harder to make sure that it's not falsely detecting a conflict.

@pconroy328 Maybe I should never put a ".local" hostname in a config file.

Correct. If you need a permanent name, use unicast DNS. Using gps.local or mqtt.local breaks the contract with mDNS. The name is only valid for the lifetime of the resource record that was resolved. You either have to keep re-resolving the name or rediscovering the name through the service.

@alexorencich I understand your point that in a network with a host having a unique name, the name shouldn't change and I agree. There is likely a bug somewhere that causes the name conflict resolution to occur. That bug may be in Avahi or it may be in another device on the network running different software. But you are violating the mDNS protocol by holding on to (caching) the name longer than the name is valid (even by just remembering the name in your head).

The next step would be to get packet captures of the conflict detection and resolution or at least a series of steps to reproduce the bug.

Correct. If you need a permanent name, use unicast DNS. Using gps.local or mqtt.local breaks the contract with mDNS. The name is only valid for the lifetime of the resource record that was resolved. You either have to keep re-resolving the name or rediscovering the name through the service.

Ok - then it's a PEBKAC issue on my part.
Thank you for taking the time to clarify it.

mDNS "resembles" but does not duplicate all aspects of DNS and it was my mistake to assume functionality that is not there.

So, what is the "correct" way for using mdns when I simply want to rsync to or SSH into a specific machine on the local network?

@alexforencich Ideally, your application would search for the service it wants to use. You can approximate this by starting with an instance name you know but if it doesn't resolve, then you have to browse yourself and use the instance name being advertised.

I could imagine a bash or zsh completion extension that browsed for ssh services when you typed ssh and then autocompletes the name you type as you type it to match the instance names it found when it browsed.

commented

@basak In the case of DHCP, you are correct that you can force a mac address to IP address binding. But you can't necessarily prevent a rogue DHCP server from showing up on your network and answering a DHCPDISCOVER before your DHCP server does. LIkewise, you can know you have a unique name on the local network until someone else plugs in a device on the same network with the same name.

commented

But can Avahi hold onto its normal name at least until that happens? That is an exceptional case, and in the steady state where all devices are known and have sane names, Avahi shouldn't be going and changing its name randomly.

I can't help on fixing these issue, but i can perhaps add more information.....

I was able to solve the issue (with multiple hostnames) by reconfiguring my router.
It seems the problems pops up while assigning a unique local address....
I get IPv4 and IPv6 addresses from my provider.
It seems that addresses change every night.
If have changed the configuration of my router as follow:

Unique Local Addresses
Select how to assign Unique Local Addresses (ULA) to the devices on the home network.

o Assigning Unique Local Addresses (ULA) when there is no IPv6 Internet connection (recommended)
x Do not assign Unique Local Addresses (ULA) (not recommended)
o Always assign Unique Local Addresses (ULA)

It seems the the naming conflict are created during the assignment of an unique local address.
The problems occur if i use the standard value, which is the first of the list.
After selecting "do not assign unique local address" the problems went away.

commented

@pconroy328 Could you post the script you're using to auto-restart Avahi? Until the issue is fixed it seems like a useful stopgap.

@pconroy328 I'd also be interested in your script. Or rather: Did you manage to configure systemd to trigger a script as soon as your mDNS name changes (or DHCP lease or IPv6 address is renewed)? Or is your script run by a cronjob?

And where is this patch? I just ran into this tonight with an RPi with a unique to my network name. It never advertised as it just kept incrementing up. It also tied up the pi while it was doing this. Arch Linux armv7. Very frustrating.

commented

But you are violating the mDNS protocol by holding on to (caching) the name longer than the name is valid (even by just remembering the name in your head).

EDIT: apologies to @pusateri for the misunderstanding. Unchanged text below.

Thanks for the mDNS crash course but this is a ridiculous justification not to fix this bug. It's not because Avahi is allowed to use any random name that it should and especially not when there's zero conflict anywhere except in itself. We don't care if ssh myhost.local or http://myprinter.local is too simplistic to be compliant with the specification: it's simple and it Just Works in the lack of bug so let's just fix the bug. Fixing this dead simple use case for 99.9% of users will never stop the 0.1% others from doing crazy dynamic, complicated and compliant stuff if they want to.