Kludex / python-multipart

A streaming multipart parser for Python.

Home Page:https://kludex.github.io/python-multipart/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parser fails on data with mixed encodings

enigmathix opened this issue · comments

When data contains a base64 encoding, then a plain text, the parser tries to decode the plain text as if it was base64, which results in an error. For example:

from io import BytesIO
import multipart

def on_field(field):
    print('field', field)

def on_file(file):
    print('file', file)

data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=field1\r\nContent-Transfer-Encoding: base64\r\n\r\nw6k=\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=field2\r\n\r\nsome text\r\n\r\n--foo--'

headers = {'Content-Type': 'multipart/form-data; boundary="foo"', 'Content-Length': str(len(data))}
multipart.parse_form(headers, BytesIO(data), on_field, on_file)

Output:

field Field(field_name=b'field1', value=b'\xc3\xa9')
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/decoders.py", line 60, in write
    decoded = base64.b64decode(val)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 88, in b64decode
    return binascii.a2b_base64(s, strict_mode=validate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Incorrect padding

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/christophe/xxxx.py", line 13, in <module>
    multipart.parse_form(headers, BytesIO(data), on_field, on_file)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1884, in parse_form
    parser.write(buff)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1776, in write
    return self.parser.write(data)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1058, in write
    l = self._internal_write(data, data_len)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1327, in _internal_write
    data_callback('part_data')
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1104, in data_callback
    self.callback(name, data, marked_index, i)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 584, in callback
    func(data, start, end)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1665, in on_part_data
    bytes_processed = vars.writer.write(data[start:end])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/decoders.py", line 62, in write
    raise DecodeError('There was an error raised while decoding '
multipart.exceptions.DecodeError: There was an error raised while decoding base64-encoded data.

The problem is that the parser is trying to decode the text "some text" as base64 when it's actually plain text.