Content-Type: message/* not parsed
daogan opened this issue · comments
What I did:
Tested a sample email taken from RFC1521 Appendix C with enmime
and compared the results with Python email parser and results from Gmail APIs.
What I expected:
Results should not differ too much, though may not necessarily be exactly the same.
What I got:
Nested part (Content-Type: message/*) not parsed but treated as single part.
Release or branch I am using:
Master
(Please attach a sample message if you feel it will help reproduce the issue)
Sample MIME I'm using:
MIME-Version: 1.0
From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Subject: A multipart example
Content-Type: multipart/mixed;
boundary=unique-boundary-1
This is the preamble area of a multipart message.
--unique-boundary-1
...Some text appears here...
--unique-boundary-1
Content-Type: message/rfc822
From: (mailbox in US-ASCII)
To: (address in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable
... Additional text in ISO-8859-1 goes here ...
--unique-boundary-1--
Results (the message/rfc822 part) parsed by Gmail APIs:
{
"partId": "4",
"mimeType": "message/rfc822",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "message/rfc822"
}
],
"body": {
"size": 0
},
"parts": [
{
"partId": "4.0",
"mimeType": "text/plain",
"filename": "",
"headers": [
{
"name": "From",
"value": "(mailbox in US-ASCII)"
},
{
"name": "To",
"value": "(address in US-ASCII)"
},
{
"name": "Subject",
"value": "(subject in US-ASCII)"
},
{
"name": "Content-Type",
"value": "Text/plain; charset=ISO-8859-1"
},
{
"name": "Content-Transfer-Encoding",
"value": "Quoted-printable"
}
],
"body": {
"size": 52,
"data": "ICAgLi4uIEFkZGl0aW9uYWwgdGV4dCBpbiBJU08tODg1OS0xIGdvZXMgaGVyZSAuLi4NCg=="
}
}
]
}
Results parsed by enmime
:
{
"part_id": "2",
"mime_type": "message/rfc822",
"body": {
"data": "RnJvbTogKG1haWxib3ggaW4gVVMtQVNDSUkpClRvOiAoYWRkcmVzcyBpbiBVUy1BU0NJSSkKU3ViamVjdDogKHN1YmplY3QgaW4gVVMtQVNDSUkpCkNvbnRlbnQtVHlwZTogVGV4dC9wbGFpbjsgY2hhcnNldD1JU08tODg1OS0xCkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6IFF1b3RlZC1wcmludGFibGUKCiAgIC4uLiBBZGRpdGlvbmFsIHRleHQgaW4gSVNPLTg4NTktMSBnb2VzIGhlcmUgLi4uCg==",
"size": 226
},
"headers": [
{
"name": "Content-Type",
"value": "message/rfc822"
}
]
}
Nested parts of Content-Type: message/*
seems not parsed, I found the Python email parser treat it separately before handling multipart
type, link, link.
This is expected, as enmime does not recursively parse nested message/rfc822 parts. To parse this nested part, use the part content to generate a new envelope. As a side note, the textproto library renders headers as a map[string][]string and not as []struct{name string, value string}.
Agree with Neil, this is working as intended right now. I'm going to treat this as a feature request, we could have an option to enable it after #90 is implemented.
Currently worked by generating a new envelope with the part content and changing the enclosing PartIDs, it would be useful to have this enhancement included in future versions, thanks for help.