sanic-org / sanic

Accelerate your web app development | Build fast. Run fast.

Home Page:https://sanic.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sanic.exceptions.BadURL: Bad URL: sip:nm

vgoklani opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

For some reason I occasionally see these types of messages in my logs:

[2023-03-29 00:42:04 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:09 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:14 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:19 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:24 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:29 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:35 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:40 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:45 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:50 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139
[2023-03-29 00:42:55 -0400] - (sanic.access)[INFO][UNKNOWN]: NONE http:///*  500 139

which lead to subsequent crashes:

[2023-03-29 00:42:09 -0400] [2306] [ERROR] protocol.connection_task uncaught
Traceback (most recent call last):
  File "http1_request_header", line 41, in http1_request_header
    :raises RuntimeError:
Exception

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/sanic/http/http1.py", line 123, in http1
    await self.http1_request_header()
  File "http1_request_header", line 57, in http1_request_header
    )
sanic.exceptions.BadRequest: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/sanic/request.py", line 152, in __init__
    self._parsed_url = parse_url(url_bytes)
  File "httptools/parser/url_parser.pyx", line 105, in httptools.parser.url_parser.parse_url
httptools.parser.errors.HttpParserInvalidURLError: invalid url b'sip:nm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "connection_task", line 15, in connection_task
    from asyncio import CancelledError
  File "/opt/conda/lib/python3.10/site-packages/sanic/http/http1.py", line 153, in http1
    await self.error_response(e)
  File "/opt/conda/lib/python3.10/site-packages/sanic/http/http1.py", line 429, in error_response
    self.create_empty_request()
  File "/opt/conda/lib/python3.10/site-packages/sanic/http/http1.py", line 447, in create_empty_request
    self.request = self.protocol.request_class(
  File "/opt/conda/lib/python3.10/site-packages/sanic/request.py", line 154, in __init__
    raise BadURL(f"Bad URL: {url_bytes.decode()}")
sanic.exceptions.BadURL: Bad URL: sip:nm

What's the best way of handling this; sanic shouldn't crash when it sees a URL that's presumably masked?

This is in fact the same issue as this, which I never resolved before:

#2528

Code snippet

No response

Expected Behavior

No response

How do you run Sanic?

As a script (app.run or Sanic.serve)

Operating System

LINUX

Sanic Version

latest

Additional context

No response

Hello! I had similar issue. It occurs when someone trying to do invalid requests to backend server, from requests library or from npm frontend server. I don't know it is your case or not, but may be it helps you to resolve this.

Is the server actually crashing or is it just leaving an ugly-looking traceback?

This is a something on the internet searching for a SIP endpoint to do a registration.

It should be getting handled with a 404 because the route doesn't exist. Not sure what the right path forward for this is, but that's the cause.

Which client sends such requests? This is not normal by any means but the HTTP spec allows including full URL rather than path in request, and for http Sanic can parse that (but I've never seen a client to it). But Sanic is a http server, not a sip server, and thus cannot parse sip URLs, and technically cannot even respond to other protocol URLs (but I presume it could do 400 Bad Request HTTP response and then disconnect the client, as it does for other invalid requests).

Or rather than bailout with 400 as usual, with a little more error handling logic it could give 421 Misdirected Request:

The request was directed at a server that is not able to produce a response. This can be sent by a server that is not configured to produce responses for the combination of scheme and authority that are included in the request URI.

Potentially you could use various hooks within Sanic to implement a custom protocol for that, if you really want them to be routed or otherwise handle them in part or full as if they were HTTP requests. I did not check the sip specs to see how such requests should be responded to by SIP servers.

Two points of action for Sanic. It currently hits this BadURL error in Request constructor, when httptools.parse_url fails. The first point, this should be silent and instead actually delived that response to client and log it with 4xx code (need to check why this is not working).

Secondly, make sure that 421 error is given if appropriate, even if the parsing is successful i.e. a client send http URL with unknown authority (host not known to server), or use the host from URL instead of host header etc. But I would really like to see a real client and use case where this is an issue, to better think how it ought to be handled (e.g. Sanic accepting any authority, but should it give 400 error if there is a mismatch between that and the host header, really hairy territory on how to handle such cases correctly). Currently Sanic only reads path and query, ignoring host and other parts of any full URL passed and successfully parsed.

Pretty sure it's some tools like nmap.

On point 1 above, I see this is properly handled in http1 main protocol, but then it crashes in making the error response in create_empty_request, that for better logs/messages still attempts to use that URL. Thus for error handling we need to simply clear the URL at some point for various errors to pass. And ideally so that we don't need much code repetition between http1, http3 and asgi, which already duplicate plenty of the error handling code (i.e. might need a bit of refactoring for the whole process). Not gonna be working on a fix myself, but that should give a good starting point if anyone wants to have a go at it.

The related errors that probably also fail in a similar fashion are various BadRequest errors risen e.g. for non-ASCII characters in URLs, so it isn't enough to simply clear the URL in request constructor when parse_url fails. On the contrary, the error response probably should only even get any URL once Request constructor has passed on the main request. I.e. probably has a real request object to use for the error response by then and won't call create_empty_request at all, and consequently we don't even need to try to pass any URL to that function (or its http3/asgi equivalents if there are any).

A quick patch fixes the handling, giving clean 400 error for sip:nm

diff --git a/sanic/http/http1.py b/sanic/http/http1.py
index bb317a9..42e08fe 100644
--- a/sanic/http/http1.py
+++ b/sanic/http/http1.py
@@ -450,18 +450,9 @@ class Http(Stream, metaclass=TouchUpMeta):
         bogus response for error handling use.
         """

-        # Reformat any URL already received with \xHH escapes for better logs
-        url_bytes = (
-            self.url.encode(errors="surrogateescape")
-            .decode("ASCII", errors="backslashreplace")
-            .encode("ASCII")
-            if self.url
-            else b"*"
-        )
-
         # FIXME: Avoid this by refactoring error handling and response code
         self.request = self.protocol.request_class(
-            url_bytes=url_bytes,
+            url_bytes=b"*",
             headers=Header({}),
             version="1.1",
             method="NONE",

The invalid URL is included in the exception message itself, so the client will see it, although in server log http:///* appears instead of what was actually attempted. If Request constructor passes, this patched function won't be called anyway and you get your normal error log and error pages incl. request url and headers.