nodemailer / mailparser

Decode mime formatted e-mails

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

If the subject is in Chinese, garbled code appears after parsing

xigirl opened this issue · comments

commented

If the subject in the email source code is Chinese, garbled code may appear after parsing, such as 锟斤拷

Subject:发信方已撤回邮件:测试测试
X-QQ-mid: tyyjxt-xx11d002-yh15wt16855880
Date:Thu, 1 Jun 2023 10:53:30 +0800
Content-Type: multipart/mixed;
boundary="----=_NextPart_45518C5C_082708D8_5A03221D"

The encoding of your input is probably wrong. Are you using utf16 or something else? If parsing it as a Unicode string, then everything works just fine:

const mail = `Subject:发信方已撤回邮件:测试测试
X-QQ-mid: tyyjxt-xx11d002-yh15wt16855880
Date:Thu, 1 Jun 2023 10:53:30 +0800
Content-Type: multipart/mixed;
boundary="----=_NextPart_45518C5C_082708D8_5A03221D"`;

const simpleParser = require('mailparser').simpleParser;
simpleParser(mail, (err, data) => {
    console.log(data.subject);
});

// prints as expected:
// 发信方已撤回邮件:测试测试
commented

I'm not sure what encoding the sender uses. The subject in the obtained email source code by pop3 is Chinese characters. Through debugging, I found that after these two steps in the processHeaders function, it became garbled and cannot be recovered

let value = ((this.libmime.decodeHeader(line.line) || {}).value || '').toString().trim();
 value = Buffer.from(value, 'binary').toString();

Mailparser defaults the charset to utf8 for header values if no encoding is specified, so make sure your strings are either regular Unicode strings or use Buffer values with utf8 bytes. If the charset encoding is something else, then the parsing fails.

If you know that the file uses a charset that is not standard, then use a converter module like iconv-lite to convert your input to Unicode before passing this to Mailparser.

Example 1 Input as a regular Unicode string

const mail = `Subject:发信方已撤回邮件:测试测试
X-QQ-mid: tyyjxt-xx11d002-yh15wt16855880
Date:Thu, 1 Jun 2023 10:53:30 +0800
Content-Type: multipart/mixed;
boundary="----=_NextPart_45518C5C_082708D8_5A03221D"`;

const simpleParser = require('mailparser').simpleParser;
simpleParser(mail, (err, data) => {
    console.log(data.subject);
});
// output: 发信方已撤回邮件:测试测试

Example 2 input as a Buffer of UTF8 encoded string

// input as a Buffer of UTF8 encoded string
const mail = Buffer.from(`Subject:发信方已撤回邮件:测试测试
X-QQ-mid: tyyjxt-xx11d002-yh15wt16855880
Date:Thu, 1 Jun 2023 10:53:30 +0800
Content-Type: multipart/mixed;
boundary="----=_NextPart_45518C5C_082708D8_5A03221D"`, 'utf8');

const simpleParser = require('mailparser').simpleParser;
simpleParser(mail, (err, data) => {
    console.log(data.subject);
});
// output: 发信方已撤回邮件:测试测试

For any other input type that Unicode strings or UTF8 encoded buffers, you need to convert the file from source encoding to UTF8.