To fail more gracefully when failing to notify jupyterhub of activity
consideRatio opened this issue · comments
The error below is not so seldom logged when restarting hub etc for various reasons. I'm considering if we want to get this to transition to a handled error that instead simply logs "Error notifying Hub of activity" without the trace, and as a bonus possibly that the next report that succeeds gets logged at info level or similar.
[E 2024-05-12 08:18:44.175 JupyterHubSingleUser] Error notifying Hub of activity
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/jupyterhub/singleuser/extension.py", line 428, in notify
await client.fetch(req)
File "/opt/conda/lib/python3.11/site-packages/tornado/simple_httpclient.py", line 340, in run
stream = await self.tcp_client.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/tornado/tcpclient.py", line 269, in connect
addrinfo = await self.resolver.resolve(host, port, af)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/tornado/netutil.py", line 433, in resolve
for fam, _, _, _, address in await asyncio.get_running_loop().getaddrinfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 868, in getaddrinfo
return await self.run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution
Action point
- Conclude if a change is wanted, and if so, what change.
I agree the noise should be reduced in common cases. I'll think a bit, but I believe the traceback is unlikely to be informative without also being a duplicate given that other API requests are likely to fail in the same way if/when there's actual activity that's disrupted (eg auth requests).
One-line error-level log with the error string (no traceback) makes sense.
Logging first success after error is a little trickier since it means maintaining a bit of state, but doable.