eneam / mboxviewer

A small but powerfull app for viewing MBOX files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request

bcrod opened this issue · comments

Support latin characters (examples: ´ ` ^ ~ ç á à ã â)

commented

MBox Viewer has some limitations as far as support for all characters sets including the characters you reported. The latin characters you listed and others should be supported in the mail message and mail summary, i.e in the mail header fields. File names, Label support is basically limited to ascii characters. MBox Viewer needs to be ported to UNICODE to address such limitations.

Can you provide more details where did you notice lack of support for the latin characters? It will allow me to provide better response to you regarding issue you raised.

MBox Viewer has some limitations as far as support for all characters sets including the characters you reported. The latin characters you listed and others should be supported in the mail message and mail summary, i.e in the mail header fields. File names, Label support is basically limited to ascii characters. MBox Viewer needs to be ported to UNICODE to address such limitations.

Can you provide more details where did you notice lack of support for the latin characters? It will allow me to provide better response to you regarding issue you raised.

Can provide a screenshot to help

Captura de tela 2022-06-03 134058

commented

Thanks. That looks similar to an issue reported two years ago which I resolved. I had to relax some mime mail decoding rules in order to resolve the issue. The presentation is obviously totaly incorrect. If you could attach the problem mail itself to this ticket, assuming the mail doesn't contain sensitive information, it would help me to understand the issue and hopefully resolve.

You can extract a single email as follow:

Select "File->Options->Export EML->YES. Select email exhibiting the issue. Double left-click on the email to open folder with files related to this email. Attach the mime-message.eml (or just message.eml if older version of MBox Viewer, file to this ticket.

commented

Thanks. I will provide an update soon.

Maybe it was Google itself that did this? I used the Google takeout tool to make a gmail backup.

commented

I don't think it is was Google. The message doesn't seem to be formatted according to the mime mail protocol specifications. I need to think what mime mail rules I would need to relax to resolve the issue, not sure yet. I did open the provided file in Thunderbird mail viewer and I see the same issue. Let you know if I figure out something.

commented

File might be corrupted in addition to missing declarations. File content says the encoding is charset=iso-8859-1" which is Western European ISO. The encoding specification is missing form Content-Type: field in the mail header. I updated the file manually (see in bold) but it still doesn't work. I tried different character sets but none worked so far.

The backup you have, do you still have downloaded mails on Google Gmail?

Content-Type: text/html; charset=iso-8859-1

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

The backup you have, do you still have downloaded mails on Google Gmail?

Yes

Here is the same email EML file downloaded directly from Gmail.
[exempl.zip]

In Gmail the format is OK, but in Thunderbird isn't. The Thunderbird shows the same problem.

I think Gmail automatically convert to UTF-8 because my Gmail is in PT-Br !!??, maybe am i wrong?

commented

See snapshot from MBox Viewer with the latest zip file. I had to rename the mail file name to ASCII since it is likely in UNICODE. Mail content is different now. Can you attach snapshot from Gmail? Is the content correct now ? Do you think the content is correct?

The subject is corrupted. Subject field in the mail doesn't have encoding specified, will check my assumption in this case.

yes the email content is correct, but as you said the subject is showing wrong. The same happens in Thunderbird. I don't know how Gmail read the subject right.

commented

The first mail backup from Gmail is definitely corrupted when I compare the first with the second file. I updated content of the first file to specify UTF-8 but that didn't work. MBox Viewer can't resolve apparent corruption. You may want to try to download mails from Gmail again and see if that resolves the issue. I doubt but it is worth to try.

I will try to to resolve the incorrect subject issue. It appears that it might be safe to assume that subject character set is the same as in the message if the character set is not specified in the mail header. The new client mail viewers are usually very specific as far as encoding to avoid guessing/problems.

I will be releasing new version soon and will attempt to address the subject issue in this new release. Let you know.

commented

MBox Viewer will not be able to fix corrupted mail content. The example of not corrupted eml file you provided doesn't follow mail standard mime specification in a couple of places. This not uncommon for older mail client applications. Incorrect subject would require heavy guessing as far as character set of the subject and other mail header field. MBox Viewer would need to examine the mail content because there is no information in the meta data at all. I am not sure MBox Viewer should attempt such heavy guessing. I have added simple guessing based on meta data but not based on the content. Simple guessing would not resolve your case. I think there is some risk with guessing based on the content, it may result in greater corruption of subject on the screen.

There are a lot of discussions on web about character set guessing, see one example below:

https://bugzilla.mozilla.org/show_bug.cgi?id=90584

Let me know whether the incorrect characters in the mail header are in large number of mails or that was an isolated case. I am still thinking about the case.

Also, did you managed to resolve Google corruption?

I think the problem was with the sender. The same email from the same sender after months is structurally different from before.
Before (see the eml file i posted before) there isn't any character preset in eml file in subject and and the email body had no strange characters (wrong showing latin characters). Now is the opposite (see the eml file i attached now). The subject is preset and the body contains the "wrong" characters. Don't know why, but this email is showing the correct latin character set in subject and the wrong in the body text. Looks like the sender is the guilty here, but the last change in software invert the problem?

[file]

commented

The latest file is correct. It explicitly encodes subject as UTF-8 which is required when Content-Transfer-Encoding: is set to 8bit. Also note that the charset is present in Content-Type field which is required. Content itself is not corrupted. I see that this email is downloaded from Google because I see X-Gmail-Labels: field at the begging. The suspect email lacks all of the proper metadata. Check these fields in previous email and you will see the difference.

The correct file i saved direct from Gmail and the wrong one was saved from the Takeout tool . Google Takeout is a official and supported tool from Google. I never imagined take a corrupted or bad set file. Google send me a MBOX file with all my emails backups. Then i search for a mbox viewer and found your app.

commented

Your last correct file starts like below with X-Gmail-Labels.

When you say "The correct file i saved direct from Gmail " do you mean as attached screenshot?
If that is what you mean, I don't see Labels field when I download from Gmail. I only see Labels when I download from Google. Did you reconfigure your Gmail account? Let me know how and double check.

I didn't experience corruption from Google but you never know, every case is different. I use Google Takout tool as well.

Hey, zigm is there a chance to hide the sensitive information like my email per example? I think i make a really bad move here. I like the discussion about it but I feel uncomfortable about this. I really really appreciate your comments, thanxs.
I edit all my previous comments and hide all sensitive data from the files. Can you do the same for me and edit your previous comments (the quoted one and the image you post - just blur is enough), please. I really appreciate. I value my privacy and I made a mistake here. I'm sorry, it was my fault not yours.

When you say "The correct file i saved direct from Gmail " do you mean as attached screenshot?

Yes. Exactly it.

commented

I removed screenshots, my apology for posting. Please review what remains and let me know what else to remove.

my apology for posting

You don't have to, it was my mistake and thank you. If you can just edit the comments before too (it have sensitive information too), thanks.

commented

Removed more text.

Thanx man, feeling more comfortable now.

I don't understanding why your eml file is different when downloaded from Gmail but my Gmail is in Portuguese-BR. Maybe Google change something?

commented

I will browse my settings and let you know. Right they are all defaults.

My imap service is activated and language PT-Br. The rest is default.

Downloaded some other emails (eml file) directly from Gmail and some of them don't start with "X-Gmail-Labels". Start with "Delivered-To: email@email.com". Don't know why.

commented

I enabled IMAP but didn't enabled clients. Didn't make a difference. Anyway, good to know that some users can have Labels included in download. I think for now I will make the next release without character set guessing. I have one feature ready for release. I will evaluate risk/benefit but that will take time.

Let me know if you have questions, need enhancements or observe issues.

commented

I downloaded your zip files. You can remove them as well.

OK. Thank you.

commented

Not able to understand and resolve corruption if the rceived gmail files.