rpki-client / rpki-client-portable

Portability shim for OpenBSD's rpki-client

Home Page:https://rpki-client.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

please can you explain the connect timeout and fallback to IPv4 when IPv6 fails?

geeohgeegeeoh opened this issue · comments

IDNIC is proving unreliable on IPv6:

rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/223636...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/223736...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/223836...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/223936...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/224036...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/d0b0cc6f-23d0-4f4f-ad5b-adbca8dbf698/224136...: connect: Operation timed out
rpki-client: https://repo-rpki.idnic.net/rrdp/notification.xml: loaded from network

It does appear to move on, but some guidance on how long it retries in 6 before moving to 4 would help

This is not really an rpki-client issue.
a) your system resolver prefers IPv6 over IPv4, this can be altered in resolv.conf (family inet4 inet6)
b) idnic should stop putting a AAAA IPv6 record in their DNS entry if they can't provide reliable IPv6 connectivity

In older version of rpki-client this is worse because of the lack of proper session keep-alive. There will be a new version in the coming days that fixes this.

I'm seeing the same issue with IDNIC while using transit from AS1299. Personally it feels like the timeout for IPv6 (until the fallback to IPv happens) seems to be longer when using https rather when using rsync (even with rpki-client 7.3)…could that be?

The http code uses getaddrinfo() and then connects to the IPs returned. I think this is a common idiom and the timeouts are all driven by libc and the kernel. openrsync uses the same method not sure about GNU rsync it kind of does the same but the code is complex with extra options on top.
As mentioned above, this is not something rpki-client must fix, the code works and the fallback happens. This is a problem of IDNIC publishing a broken IPv6 setup on systems preferring IPv6 over IPv4.

The http code uses getaddrinfo() and then connects to the IPs returned. I think this is a common idiom and the timeouts are all driven by libc and the kernel. openrsync uses the same method not sure about GNU rsync it kind of does the same but the code is complex with extra options on top. As mentioned above, this is not something rpki-client must fix, the code works and the fallback happens. This is a problem of IDNIC publishing a broken IPv6 setup on systems preferring IPv6 over IPv4.

I don't think I can couch this as an obligation, but I observe its an implicit DOS on all your dual-stack clients: block V6 in novel ways, and your client hangs for an inordinate period of time. In the DNS, people do things in parallel in async ways to avoid this. in the web browser, the HE code does things in parallel to avoid this. It's a huge complexity and I can "get" you want to keep it simple. I just observe, its a royal pain. I think you could mitigate.

I think you should code more defensively. (btw I have sought to communicate with IDNIC about their platform)

If you want to go to WONTFIX I won't kick up a stink. i think this is a missed opportunity, but sure: all workarounds have cost.

We don't think this needs fixing right now. Having a bad IPv6 CA repo will increase runtime but it is something that operators can fix by adjusting resolv.conf or reject routing IPv6 prefixes. Having workarounds for broken setups is out of scope for rpki-client.