eneam / mboxviewer

A small but powerfull app for viewing MBOX files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

partially garbled

cm3 opened this issue · comments

In the message window, Japanese messages are often partially garbled.

For example,

◎Windows8版ダウンロードURL

is rendered as

◎Windows8版ダウ瘢雹ンロ・踉札・・孀娘碣暑

"View EML" function is not affected by this problem and I can read the message correctly through that. According to that file, the mail is written in ISO-2022-JP https://en.wikipedia.org/wiki/ISO/IEC_2022#ISO-2022-JP

The mail header also says

Content-Type: text/plain; charset="iso-2022-jp"

and in the message window, "JIS(日本語)" encoding is selected. That should be correct. Original hex sequence corresponding to the phrase is

1B 24 42 21 7D 1B 28 42 57 69 6E 64 6F 77 73 38 1B 24 42 48 47 25 40 25 26 25 73 25 6D 21 3C 25 49 1B 28 42 55 52 4C

commented

The charset="iso-2022-jp" is supported unless there is a bug. To see the list of supported charset, select "View --> View Page Code Ids".

"View EML" function is not affected by this problem and I can read the message correctly through that. According to that file, the mail is written in ISO-2022-JP"

Sounds you can export and read the eml file in the text editor but you didn't try to open this eml file in MBox Viewer.

Did you try to view the email in Browser by selecting the email followed by the right-click and selecting "Open in Browser" option ?

If the email content is not sensitive, I would appreciate if you can attach the eml file to the post so I can investigate.

Also, let me know which version of MBox Viewer you are running.

The charset="iso-2022-jp" is supported unless there is a bug. To see the list of supported charset, select "View --> View Page Code Ids".

There are 50220 and 50222 named "iso-2022-jp". However, in the right-click "encoding" menu, that detailed distinction is not found. Is there any mean to specify the detailed encoding?

Sounds you can export and read the eml file in the text editor but you didn't try to open this eml file in MBox Viewer.

Did you try to view the email in Browser by selecting the email followed by the right-click and selecting "Open in Browser" option ?

Yes, those results are also affected by the problem.

Also, let me know which version of MBox Viewer you are running.

1.0.3.32 64bit

If the email content is not sensitive, I would appreciate if you can attach the eml file to the post so I can investigate.

The mail was advertisement, so I think it is not sensitive. I attached with changing extension to .txt (because GitHub didn't accept .eml), anonymizing some mail addresses, and trimming the content. (I've checked that the problem can be reproduced with the attached file)

mime-message_problem_anonym.txt

commented

There are 50220 and 50222 named "iso-2022-jp". However, in the right-click "encoding" menu, that detailed distinction is not found. Is there any mean to specify the detailed encoding?

I didn't notice this duplication. Sounds like potential issue. MBox Viewer would use one of these definitions, let you know which one.

The attached file looks suspect and MBox Viewer may not handle ISO correctly, I will investigate.

There are many ESC(B and EC$B sequences in the text which may cause problem for MBox Viewer. What is your email provider?

commented

Right now the display seems to corrupted. It looks like MBox Viewer would need to translate ISO 2020 to UTF8 first. ISO 2020 control sequence don't seem to work in HTML file. Is this very old email otherwise I am surprised is not already in UTF8. I will investigate the issue and provide an update. Thanks for raising the issue.

There are 50220 and 50222 named "iso-2022-jp". However, in the right-click "encoding" menu, that detailed distinction is not found. Is there any mean to specify the detailed encoding?

I didn't notice this duplication. Sounds like potential issue. MBox Viewer would use one of these definitions, let you know which one.

I also thought it can be potential issue, but I'm not sure which is the correct. Maybe 50220.

There are many ESC(B and EC$B sequences

Yes, that is the complicated point. Those sequences are still used and I found a mail at 2022/7/28 as the latest usage of ISO-2022-JP in my mail box (which is from a credit card company, so I cannot attach the mail). Recently, large number of e-mails are written in UTF-8 in Japan, but there are still some in ISO-2022-JP (maybe because some feature phone can only handle this encoding)

What is your email provider?

I downloaded my Gmail mbox from the takeout function and am trying to build an offline archive.

I will investigate the issue and provide an update.

Thank you in advance. I really appreciate it.

commented

The charset="iso-2020-jp" might be just generic introducer. Within the content, there are multiple character set introducers before character block/string.


   Escape Sequence	 Character Set
   ---------------------------------------------------------------
   ESC ( B		 ACSII
   ESC ( J		 JIS X0201-1976 (left-hand part)
   ESC $ @		 JIS X0208-1978
   ESC $ ( 0	 User-defined characters (This range of char-
		 acters is proprietary to Compaq.)
   ESC $ B		 JIS X0208-1983
   ---------------------------------------------------------------

What I am not sure why I am seeing ESC(B on the first line:

ESC(BFrom 1417901534999354757@xxx Tue Nov 06 15:37:57 +0000 2012

which seems to suggest that your entire archive file is in ISO 2020 format?

Sounds you downloaded now mail archive from Gmail, that should fix this issue I think.

commented

It is possible ESC(B on the first line is due to MBox Viewer having problem. Stay tuned -:)

commented

It looks that I need to translate ISO-2022-jp to UTF-8 first in order for this to work. I did quick prototype and it worked in the message window. Need to make corresponding changes to "Print to ..". I am in the middle of making some other changes. Will need to complete pending changes and create new release.

Regarding the ESC(B on the first line of eml file. If you could provide the mbox mail archive file with 3 emails, that would help to investigate the issue. If that not possible, I will try to create such mbox file from the eml file you provided but it may not be the same as original. The mbox file should contain the problem email you provided (or similar with the same issue) and one email before and one after the problem email.

You would create the mbox archive file as follow:

  1. While in "All Mails" select these 3 email. Right-click on the selection and select "Copy Selected into User Select mails"
  2. Select 'User Selected mails"
  3. Right-click on any of 3 emails and select "Save as Mbox Mail Archive file"
  4. Attach the created mbox mail archive file to the post

Will provide updates on the progress.

sorry for my late reply, I'll prepare the sample tonight (12 hours later)

commented

I have completed all code changes and I am basically ready to create new v1.0.3.33 release. I will try to figure out why there is extra ESC(B on the first line. I let you know if I figure out this otherwise your sample might help to solve the issue.

commented

Ignore my request for sample data. I need raw data which I will not get the way I described. I may have to add special option to dump raw data. Let you know what I decided.

commented

I released v1.0.3.33 to resolve support for iso-2022-* such as iso-2022-jp. Testing show no difference regardless whether page code is set 50220 or 50222 when applied to the provided sample. Hope the Japanese text is displayed correctly now.

I also added new option "File --> Development Options --> Create Mail Archive File" to allow to investigate ESC(B on the first line in the sample eml file. Follow below steps to create sample mbox mail archive file:

  1. Select mail archive file and select the problem mail (now the text should be correctly displayed)
  2. Select "File --> Development Options --> Create mail Archive File"
  3. Dialog will appear. Select OK. The new mbox file will be created in the same directory as the investigated mbox file.
  4. From GUI, Select folder housing the investigated mbox file
  5. Right-click on the folder and select "Refresh Folder". Newly created mbox file should appear.
  6. Select new mail archive file and examine content of all mails. Note that order of mails in the original and new mail archives may differ.
  7. Make sure that none of the emails contains the sensitive content, Otherwise you may need to redo steps to create the mail archive with different subset of mails. You may reduce the leading and/or trailing mail counts. Worse case scenario, you may have to select new Japanese mail.
  8. Provide the new mbox mail archive file.

Thank you for v1.0.3.33 release. Mails in ISO-2022-jp is correctly rendered now :) I want to send some emails for testing but I can't make it openly available as it is on GitHub for some reasons (even though it is not sensitive). Could you contact me via email address written in my profile? I'll send the archive file as you suggested via email.

commented

Since I already downloaded the sample, you should be able to edit your old post and delete the attachment sample mime-message_problem_anonym.txt

commented

The v1.0.3.33 resolved the reported issue.