aio-libs / aiohttp-devtools

dev tools for aiohttp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cleanup context is not run on reload on Windows

haukex opened this issue · comments

  • aiohttp-devtools version: 1.0.post0
  • aiohttp version: 3.8.4
  • python version: 3.11.3
  • Platform: Windows 10

I have discovered two cases in which the second half of a Cleanup Context after the yield is not run. The first is in #547, and the second is in this issue.

The documentation says:

aiohttp guarantees that cleanup code is called if and only if startup code was successfully finished.

I have this simple code, which just has one on_startup, one on_shutdown, and one cleanup_ctx handler.

from aiohttp import web

routes = web.RouteTableDef()

@routes.get('/')
async def get_events(_request):
    return web.Response(text="Hello, world")

app = web.Application()
async def startup(_app):
    print("====> STARTUP", flush=True)
app.on_startup.append(startup)
async def context(_app):
    print("====> BEFORE", flush=True)
    yield
    print("=====> AFTER", flush=True)
    with open('test.log','a') as fh: fh.write("Hello\n")
app.cleanup_ctx.append(context)
async def shutdown(_app):
    print("=====> SHUTDOWN", flush=True)
app.on_shutdown.append(shutdown)
app.add_routes(routes)

if __name__ == '__main__':
    web.run_app(app)

On Linux, when I run the server using adev runserver --no-livereload testserv.py, and then edit some whitespace in the file and save it, the output is as follows:

[19:03:10] Starting aux server at http://localhost:8001 ◆
[19:03:10] Starting dev server at http://localhost:8000 ●
======== Running on http://0.0.0.0:8001 ========
(Press CTRL+C to quit)
====> BEFORE
====> STARTUP
=====> SHUTDOWN
=====> AFTER
[19:03:16] Restarting dev server at http://localhost:8000 ●
====> BEFORE
====> STARTUP

This is what I would expect. However, on Windows 10, the output is as follows:

[21:04:01] Starting aux server at http://localhost:8001 ◆
[21:04:01] Starting dev server at http://localhost:8000 ●
======== Running on http://0.0.0.0:8001 ========
(Press CTRL+C to quit)
====> BEFORE
====> STARTUP
[21:04:06] Restarting dev server at http://localhost:8000 ●
====> BEFORE
====> STARTUP

Like in issue #547, the second half of the cleanup context is never run. Also, the shutdown handler is not run either.

This one suggests that Windows isn't sending the interrupt signal through.
Code is: https://github.com/aio-libs/aiohttp-devtools/blob/master/aiohttp_devtools/runserver/watch.py#L122

Note the debug logs, if I run with -v, my logs look like:

====> BEFORE
====> STARTUP
[21:45:04] 3 changes, restarting server
[21:45:04] stopping server process...
=====> SHUTDOWN
=====> AFTER
[21:45:04] process stopped
[21:45:04] Restarting dev server at http://localhost:8000 ●

Which is a successful stop. I'm guessing when you run it on Windows, you'll see the other log where it sends SIGKILL. Not sure why SIGINT isn't getting through or being handled correctly on Windows...

With -v, the output on Windows is:

======== Running on http://0.0.0.0:8001 ========
(Press CTRL+C to quit)
[09:56:38] successfully loaded "testserv" from "C:\Users\daempfling\code\aiohttp-devtools"
[09:56:38] livereload enabled: ✓
====> BEFORE
====> STARTUP
[09:56:48] 1 changes, restarting server
[09:56:48] stopping server process...
[09:56:48] process stopped
[09:56:48] Restarting dev server at http://localhost:8000 ●
[09:56:49] successfully loaded "testserv" from "C:\Users\daempfling\code\aiohttp-devtools"
[09:56:49] livereload enabled: ✓
====> BEFORE
====> STARTUP

As in #547, it seems that on Windows the process is simply being terminated too early, and on Linux, the issue is fixed in the master branch.

(Unfortunately, I'm out of time to debug further right now, and Windows is not a priority for me at the moment, so I can't say when I might get back to debugging on Windows.)

OK, I found it:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_runner.py#L302

Doesn't work on Windows. Not sure if there's anything we can do or not...

Thanks, I played around a bit with the following code, and the only way I got Windows to catch anything so far was to catch KeyboardInterrupt thrown from loop.run_until_complete. I also found this article, which might be useful: https://github.com/wbenny/python-graceful-shutdown

import asyncio
import signal
import sys
from datetime import datetime, timezone

class GracefulExit(SystemExit):
    code = 1

loop = asyncio.get_event_loop()

if sys.platform.startswith("win32"):
    # this unfortunately don't seem to catch the KeyboardInterrupt
    def _excepthandler(_loop, context):
        print(repr(context))
        loop.default_exception_handler(context)
    loop.set_exception_handler(_excepthandler)
    # these two don't catch the KeyboardInterrupt either
    def _excepthook(typ, value, traceback):
        print(f"sys.excepthook({typ=}, {value=}, {traceback=})")
    def _unraisablehook(unraisable):
        print(f"sys.unraisablehook({unraisable=})")
    sys.excepthook = _excepthook
    sys.unraisablehook = _unraisablehook
else:  # Linux
    def _raise_graceful_exit():
        raise GracefulExit()
    loop.add_signal_handler(signal.SIGINT, _raise_graceful_exit)
    loop.add_signal_handler(signal.SIGTERM, _raise_graceful_exit)

async def periodic():
    while True:
        print(datetime.now(timezone.utc).isoformat())
        await asyncio.sleep(1)
task = loop.create_task(periodic())

try:
    loop.run_until_complete(task)
except GracefulExit:  # works on Linux
    print("Yay, GracefulExit")
except asyncio.CancelledError as ex:
    print("Cancelled")
except KeyboardInterrupt as ex:  # this works on Windows
    print("KeyboardInterrupt")

Yep, those won't work in asyncio, as it needs to handle them differently, so you won't get them there (i.e. It needs to capture the exception without inserting it into whatever code happens to be running, and then use that signal to cancel the running tasks).
That page says that it won't be possible using asyncio.run(), though I'm not sure it really explains the alternative...
https://github.com/wbenny/python-graceful-shutdown#asynchronous-code

That means we'd need to replace all the asyncio.Runner() calls with something more low level to make it work correctly. Then we'd want to make the same changes to aiohttp.web.run_app(), which is our version of asyncio.run() specific to aiohttp apps.
But, all of this is getting into very low-level asyncio details that most developers should never even need to look at. So, I'm not sure it's worth looking into further.

I just checked, and when running without adev, i.e. just python3 testserv.py, the graceful shutdown on Ctrl-C works on Windows, so perhaps it's worth looking at how it was solved there.

But as I mentioned, getting this working on Windows is low-priority for me. I could offer to provide a pull request for adding a note to the documentation (here and perhaps even here) that graceful shutdown doesn't work in the devtools in Windows?

That's interesting. Well, if you can add some prints at:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_runner.py#L299

Then confirm if the signal handlers are getting set. I don't think there's any other method for graceful shutdown, so I'd assume it's actually managing to set the signal handlers in that case...

So I played around a bit and on Windows, the signal handlers are not installed, i.e. the NotImplementedError is raised at the code you indicated. I had a suspicion that there's a difference between typing Ctrl-C on the console (which results in a clean shutdown when not using the devtools) and what the devtools do here, os.kill(self._process.pid, signal.SIGINT). I tried using signal.CTRL_C_EVENT instead, which did result in a clean shutdown of the process, but it also blew up the entire Python process, i.e. halted it with an "Aborted!".

To confirm my suspicion, here is signaltest.py:

import asyncio
from datetime import datetime, timezone

async def periodic():
    while True:
        print(datetime.now(timezone.utc).isoformat())
        await asyncio.sleep(1)

def runit():
    with asyncio.Runner() as runner:
        try:
            runner.run(periodic())
            runner.get_loop().run_forever()
        except KeyboardInterrupt:
            print("KeyboardInterrupt")
        finally:
            print("CLEANUP")

if __name__ == '__main__':
    runit()

Which produces the following output, where after a few seconds I hit Ctrl-C:

$ python signaltest.py
2023-04-19T18:52:19.757138+00:00
2023-04-19T18:52:20.758393+00:00
2023-04-19T18:52:21.761270+00:00
2023-04-19T18:52:22.777922+00:00
2023-04-19T18:52:23.793960+00:00
KeyboardInterrupt
CLEANUP

And here is subproc.py:

from multiprocessing import Process
from signaltest import runit
import os
import signal
from time import sleep

def runsubproc():
    process = Process(target=runit)
    print("Starting...")
    process.start()
    sleep(5)
    print("Killing...")
    os.kill(process.pid, signal.SIGINT)
    process.join(5)
    print("Joined.")

if __name__ == '__main__':
    runsubproc()

Which outputs the following:

$ python subproc.py
Starting...
2023-04-19T18:53:07.746260+00:00
2023-04-19T18:53:08.755552+00:00
2023-04-19T18:53:09.767671+00:00
2023-04-19T18:53:10.787967+00:00
2023-04-19T18:53:11.809301+00:00
Killing...
Joined.

So I think the question is how to raise a KeyboardInterrupt (or some other exception to cleanly unwind the stack and cause cleanup) in the server process instead of simply shooting it out of the sky with whatever emulation of SIGINT exists on Windows. That's as far as I've gotten for now...

Ah, so the cleanup is done with both the graceful exit, but also KeyboardInterrupt (although I can't get that code to run if I remove the signal handlers):
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web.py#L506

Documentation says it should produce a KeyboardInterrupt, it doesn't suggest anything else about Windows...:
https://docs.python.org/3/library/signal.html#signal.SIGINT

So I think the question is how to raise a KeyboardInterrupt (or some other exception to cleanly unwind the stack and cause cleanup) in the server process

If it's actually inserted directly into the running code, it's going to mess everything up (see https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions).
That's why we use a signal handler, and why the default behaviour of run() also does something similar, to stop the exception being thrown into the running code.

You can probably test it with something like:

async def periodic():
    try:
        while True:
            print(datetime.now(timezone.utc).isoformat())
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Inner keyboard interrupt")

If runner.run() is still handling it correctly though, then there must be a way we can handle it correctly.

Also, minor note, but the run_forever() call in your example will never happen.

Also, minor note, but the run_forever() call in your example will never happen.

Yes, I cargo-culted that from here.

I found this excellent StackOverflow answer on the the apparently very complicated story of task termination via signals on Windows: https://stackoverflow.com/a/35792192

What I am getting from all this is that on Windows it'd be best to find a different way to signal the server that it should gracefully shut down.

Here's the relatively "best" cross-platform idea I've come up with so far: If the AppTask were to start a TCP server (it could have the OS choose a free TCP port number and pass it to the Process; Windows AF_UNIX support isn't in Python yet), the server could connect to that server, and when the AppTask wants to signal the server to restart, it simply shuts down its own TCP server, closing the connection and thereby telling the server to shut down.

As we already modify the Application, we could probably just add an endpoint or something to the app to do that.
https://github.com/aio-libs/aiohttp-devtools/blob/master/aiohttp_devtools/runserver/serve.py#L42

I'd probably still only do it if the signal setup fails though.

Sure, that could also be a possibility. By endpoint, do you mean a specific HTTP request that would cause the server to shut down? In that case I wonder how to avoid collisions with catch-all routes that the user may have defined.

Probably not going to be an issue. We do the same thing in aiohttp-debugtoolbar.

I'd probably still only do it if the signal setup fails though.

The minor issue I see with that is that the sequence of calls is that serve_main_app first calls create_main_app, which calls modify_main_app, which is where I guess the new endpoint could be added. Only afterwards does serve_main_app call start_main_app, which is what calls aiohttp.web_runner.BaseRunner.setup, which is what tries to set up the signals, and currently there is no information stored as to whether the signals were set up successfully. So that would require a modification to aiohttp to return that information from setup.

How about a Config option like e.g. shutdown_endpoint that defaults to True on Windows and False otherwise?

OK, turn it into a PR and we'll have a look at it.

If you think there's any possibility of making a test for it, that'd be great (can be Windows only using @pytest.skipif).

Ok, I opened PR #549. I looked at the coverage report, and it turns out running the tests on Windows already exercises the important bits of my code (I was also able to remove the @non_windows_test from test_stop_process_dirty).