miguelgrinberg / microdot

The impossibly small web framework for Python and MicroPython.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crashing when using öäü special chars in html input as form

maxi07 opened this issue · comments

commented

Describe the bug
I am running a sample app to catch user input from a textbox. The html displays a textbox and a submit button. When the button is pressed, the textbox content is printed to the console just fine. If the textbox contains special characters like äöü, the urldecode_bytes function crashes with a UnicodeError.

To Reproduce

<!DOCTYPE html>
<html>
    <head>
        <title>Microdot Example Page</title>
    </head>
    <body>
        <div>
            <h1>Microdot Example Page</h1>
            <p>Hello from Microdot!</p>
            <p><a href="/shutdown">Click to shutdown the server</a></p>
            <form action="/" method="post">
                <input name="test-input" id="test-input"/>
                <button name="test-button" id="test-button" type="submit">Submit</button>
            </form>
        </div>
    </body>
</html>
@app.post('/')
async def hello2(request):
    print(request.form['test-input'])
    return htmldoc, 200, {'Content-Type': 'text/html'}

Expected behavior
I expect microdot to also decode special characters.

Additional context
You are doing amazing work!

StackTrace

  File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 466, in form
    self._form = self._parse_urlencoded(self.body)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 413, in _parse_urlencoded
    data[urldecode_bytes(k)] = urldecode_bytes(v)
                               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 93, in urldecode_bytes
    return b''.join(result).decode('utf-8')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 4: invalid start byte

What browser is this? Internet Explorer by any chance?

commented

Haha, no - I am going with Microsoft Edge (Windows 11, Version 113.0.1774.35) and Safari for iOS 16.5.
If the textbox contains "text", it works fine, as urlencoded is b'test-input=test&test-button='.
If the textbox input is "test ä", the program crashes with the given error. urlencoded is then equal to b'test-input=test+%E4&test-button='

@maxi07 Okay, here is the solution:

<!DOCTYPE html>
<html>
    <head>
        <title>Microdot Example Page</title>
        <meta charset="UTF-8">          <--! add this line -->
    </head>
    <body>
        <div>
            <h1>Microdot Example Page</h1>
            <p>Hello from Microdot!</p>
            <p><a href="/shutdown">Click to shutdown the server</a></p>
            <form action="/" method="post">
                <input name="test-input" id="test-input"/>
                <button name="test-button" id="test-button" type="submit">Submit</button>
            </form>
        </div>
    </body>
</html>

This tells the browser that the page and any form submissions originating from it should be encoded in UTF-8.