Kludex / python-multipart

A streaming multipart parser for Python.

Home Page:https://kludex.github.io/python-multipart/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Skip header area?

msdemlei opened this issue · comments

RFC 1341 says about multipart payloads:

This [the multipart content-type] indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty...

I read that as: "a multipart payload may have a header area". As a matter of fact, when you construct a multipart payload using email.mime.multipart.MIMEMultipart and then calling as_string(), you get:

MIME-Version: 1.0
Content-Type: multipart/form-data; boundary="========== bounda r y 930"

--========== bounda r y 930\nMIME-Version: 1.0
Content-Type: application/octet-stream
...

The multipart parser from the old cgi module would correctly grok that when I uploaded it (provided I arranged for the content-type to be in the HTTP header). python-multipart (tried 0.0.5) does not and fails when it sees the M of the MIME-Version.

I appreciate the "as browsers send it" part of multipart's rationale; however, being somewhat friendly to machine-generated multipart messages would, I think, be a friendly gesture, and it might prevent breakage as people move from cgi's FieldStorage (used, e.g., in twisted) to python-multipart.

What I think should be done: in MultipartParser's _internal_write, we should blindly skip everything until the MIME header area is consumed, i.e., until we have found a CRLFCRLF sequence. Would you consider such a PR?