Skip header area?
msdemlei opened this issue · comments
RFC 1341 says about multipart payloads:
This [the multipart content-type] indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty...
I read that as: "a multipart payload may have a header area". As a matter of fact, when you construct a multipart payload using email.mime.multipart.MIMEMultipart and then calling as_string(), you get:
MIME-Version: 1.0
Content-Type: multipart/form-data; boundary="========== bounda r y 930"
--========== bounda r y 930\nMIME-Version: 1.0
Content-Type: application/octet-stream
...
The multipart parser from the old cgi module would correctly grok that when I uploaded it (provided I arranged for the content-type to be in the HTTP header). python-multipart (tried 0.0.5) does not and fails when it sees the M of the MIME-Version.
I appreciate the "as browsers send it" part of multipart's rationale; however, being somewhat friendly to machine-generated multipart messages would, I think, be a friendly gesture, and it might prevent breakage as people move from cgi's FieldStorage (used, e.g., in twisted) to python-multipart.
What I think should be done: in MultipartParser's _internal_write, we should blindly skip everything until the MIME header area is consumed, i.e., until we have found a CRLFCRLF sequence. Would you consider such a PR?