Parser fails on data with mixed encodings
enigmathix opened this issue · comments
enigmatix commented
When data contains a base64 encoding, then a plain text, the parser tries to decode the plain text as if it was base64, which results in an error. For example:
from io import BytesIO
import multipart
def on_field(field):
print('field', field)
def on_file(file):
print('file', file)
data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=field1\r\nContent-Transfer-Encoding: base64\r\n\r\nw6k=\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=field2\r\n\r\nsome text\r\n\r\n--foo--'
headers = {'Content-Type': 'multipart/form-data; boundary="foo"', 'Content-Length': str(len(data))}
multipart.parse_form(headers, BytesIO(data), on_field, on_file)
Output:
field Field(field_name=b'field1', value=b'\xc3\xa9')
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/decoders.py", line 60, in write
decoded = base64.b64decode(val)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 88, in b64decode
return binascii.a2b_base64(s, strict_mode=validate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Incorrect padding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/christophe/xxxx.py", line 13, in <module>
multipart.parse_form(headers, BytesIO(data), on_field, on_file)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1884, in parse_form
parser.write(buff)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1776, in write
return self.parser.write(data)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1058, in write
l = self._internal_write(data, data_len)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1327, in _internal_write
data_callback('part_data')
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1104, in data_callback
self.callback(name, data, marked_index, i)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 584, in callback
func(data, start, end)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/multipart.py", line 1665, in on_part_data
bytes_processed = vars.writer.write(data[start:end])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/multipart/decoders.py", line 62, in write
raise DecodeError('There was an error raised while decoding '
multipart.exceptions.DecodeError: There was an error raised while decoding base64-encoded data.
The problem is that the parser is trying to decode the text "some text" as base64 when it's actually plain text.