rgrove / larch

:skull: Larch copies messages from one IMAP server to another. No longer maintained.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inserting missing Message-Id

hennegwath opened this issue · comments

Hi,

I fight against mails which do not have a message-id header attribute. Because I have to modify the content (read #56) on-the-fly I urgently need this line to identify the original mails on my source server and the modified ones on the target server.
I can get the message-Id from the message envelope. This works, but the messages are copied repeatetely anyway. So, where exactly does larch look and compare the message-Id, and where could I then put a "take the message ID from the envelope"-code?

regards,
Henning

PS: I nearly got larch working with my FirstClass Server which has a really broken Imap interface. The only thing is that badly encoded envelopes (e.g. with some special characters) cannot be parsed by ruby IMAP and therefore those mails are skipped. The rest works with just one additional "repair-function". I'll share it soon, so someone who is into ruby could look over it :)

If you're asking how to have Larch recognize Message-Ids that are generated on the fly from something other than the actual Message-Id header, then you'll probably need to modify the create_guid() method here: https://github.com/rgrove/larch/blob/master/lib/larch/imap/mailbox.rb#L334-L349

Whatever value that method returns will be used as a unique identifier for the message, and will be compared with the guids for messages on the destination server to determine whether the message is a duplicate.

Hi,

Because Message-Id is sometimes only present in the mail envelope, I fetched that during the first stage to have it there when create_guid is called. But sometimes the envelope is broken and cannot be parsed (Net::IMAP exceptions). Before, larch then skipped that message and continued with the next (probably not broken) one. Fetching the envelope during this first stage breakes this behavior and I get stuck in a "exceptions-loop"... So I decided to use only INTERNALDATE and hope no two messages with identical values are present in the same folder.

Would you propose to add another attribute (not size, because I modify the messages on-the-fly)?

thank's and my very best regards,
Henning

Just as a side-note: With the above approach I managed to migrate nearly all mail (90%) from the broken FirstClass Server to a new cyrus-imapd.

regards,
LaClaro a.k.a. hennegwath