shinyoshiaki / werift-webrtc

WebRTC Implementation for TypeScript (Node.js), includes ICE/DTLS/SCTP/RTP/SRTP/WEBM/MP4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

no response on second peer connection stun request

koush opened this issue · comments

I've been having a hard time tracking this issue down, and was wanting to see if you had any insight:

Clients:
Offer: Safari iOS (on cellular)
Answer: Werift on Mac

On first run of connecting the peers, werift will successfully retrieve the srflx candidates from the STUN server. On second or third run, werift will fail to retrieve a response from the STUN server, causing the connection to fail. After a few minutes, the STUN server seems to start responding again.

I have observed this behavior with multiple different stun servers (with no quotas enabled). I only observe this behavior when the peer is on cellular. When the Safari iOS peer is on local network, Werift will successfully receive a STUN response every time.

I'm unsure why the cellular network peer would cause STUN requests to fail in werift. I verified that the requests are being sent, and no response is received on the datagram socket.

Do you have any ideas?

My guess is that werift is somehow sending bad stun requests after the first one.

I figured this out, I think. The iOS device is trickling candidates on cellular, some of which end with .local, which will be unreachable/unresolvable by the Mac since the iOS device is not on the local network.

The .local names are the privacy obfuscated ones that I think use bonjour to resolve:

86156b49-16e3-4375-9a30-ed9e430580fe.local

These .local hostnames end up being resolved here:

.promisify(dns.lookup)(remoteCandidate.host)

The issue seems to be that these local lookups take a long time to timeout, and they are run by the operating system or the node.js process serially. Meaning, only 1 lookup happens at a time. So, when the request to lookup stun.l.google.com comes in, it has to wait in line behind all the other .local lookups that are pending. That is why:

After a few minutes, the STUN server seems to start responding again.

I'm not entirely sure this is the case, but I verified that forcing .local addresses to fail (or fall through to the non-lookup path) allows the stun requests to complete. Something with Mac dns.lookup is causing a udp send with an unresolved host string to fail.

dns.lookup() does not necessarily have anything to do with the DNS protocol. The implementation uses an operating system facility that can associate names with addresses, and vice versa. This implementation can have subtle but important consequences on the behavior of any Node.js program. Please take some time to consult the Implementation considerations section before using dns.lookup().

Though the call to dns.lookup() will be asynchronous from JavaScript's perspective, it is implemented as a synchronous call to getaddrinfo(3) that runs on libuv's threadpool. This can have surprising negative performance implications for some applications, see the UV_THREADPOOL_SIZE documentation for more information.

My guess is these .local candidates are consuming the thread pool.

Although it is not a fundamental solution, I have implemented a restriction on the execution of dns.lookup.

#218

Alternate solution that works well for me. #219