saghul / aiodns

Simple DNS resolver for asyncio

Home Page:https://pypi.python.org/pypi/aiodns

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Queries intermittently freezing asyncio event loop

davidmcnabnz opened this issue · comments

Most of the time, aiodns is fine. But on rare occasions, it gets stuck on a C write() call deep within pycares.

This freezes the entire event loop indefinitely, because the write() call never returns.

My original calling code is like:

dns = aiodns.DNSResolver()
reply = await dns.query(somedomain, 'MX')

For now, I'll look at workarounds like moving all my aiodns queries off to separate threads, but this seems to be inefficient.

But I'd welcome some advice on this.

Below is the py-spy stack trace of where the aiodns call is getting stuck.

Thread 380494 (idle): "MainThread"
    write (libpthread-2.31.so)
    _Py_DECREF (object.h:422)
    _my_PyErr_WriteUnraisable (_cffi_backend.c:6113)
    general_invoke_callback (_cffi_errors.h:147)
    gil_release (misc_thread_common.h:370)
    cffi_call_python (call_python.c:278)
    _sock_state_cb (_cares.c:998)
    open_udp_socket (ares_process.c:1240)
    ares__send_query (ares_process.c:854)
    ares_send (ares_send.c:131)
    ares_query (ares_query.c:138)
    _cffi_f_ares_query (_cares.c:3287)
    _do_query (pycares/__init__.py:581)
    query (pycares/__init__.py:561)
    query (aiodns/__init__.py:90)

The nature of this issue means that using asyncio timeout wrappers cannot work, because once the thread's event loop is stuck inside a C function call, there's no way for a TimeoutError to get thrown up to the wrapper.

I've also filed an issue with the pycares tracker:

What a weird one!

Drilling down, what happens is pycares got some activity on a file descriptor and called the socket state callback, which aiodns uses:

def _sock_state_cb(self, fd: int, readable: bool, writable: bool) -> None:

Here is where pycares calls is: https://github.com/saghul/pycares/blob/de2ed40596f543f989bbcea30632be751133c110/src/pycares/__init__.py#L97

Something seems to happen which causes an unraiseable error: _my_PyErr_WriteUnraisable (_cffi_backend.c:6113) and then it's the call to wirte it to standard out which seemingly gets stuck.

Very weird.

On the pycares issue you seem to be using 4.2 which is an older release. Can you please test with the latest version of both packages?

Also, a repro script, even if it takes ours would be useful.