To fail more gracefully when failing to notify jupyterhub of activity

Question

To fail more gracefully when failing to notify jupyterhub of activity

consideRatio opened this issue a month ago · comments

The error below is not so seldom logged when restarting hub etc for various reasons. I'm considering if we want to get this to transition to a handled error that instead simply logs "Error notifying Hub of activity" without the trace, and as a bonus possibly that the next report that succeeds gets logged at info level or similar.

[E 2024-05-12 08:18:44.175 JupyterHubSingleUser] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 428, in notify
        await client.fetch(req)
      File "/opt/conda/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 340, in run
        stream = await self.tcp_client.connect(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/tcpclient.py", line 269, in connect
        addrinfo = await self.resolver.resolve(host, port, af)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/tornado/netutil.py", line 433, in resolve
        for fam, _, _, _, address in await asyncio.get_running_loop().getaddrinfo(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 868, in getaddrinfo
        return await self.run_in_executor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/socket.py", line 962, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    socket.gaierror: [Errno -3] Temporary failure in name resolution

Action point

Conclude if a change is wanted, and if so, what change.

Min RK · Answer 1 · Mon May 13 2024 03:44:12 GMT+0800 (China Standard Time)

I agree the noise should be reduced in common cases. I'll think a bit, but I believe the traceback is unlikely to be informative without also being a duplicate given that other API requests are likely to fail in the same way if/when there's actual activity that's disrupted (eg auth requests).

One-line error-level log with the error string (no traceback) makes sense.

Logging first success after error is a little trickier since it means maintaining a bit of state, but doable.