bbottema / outlook-message-parser

A Java parser for Outlook messages (.msg files)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SimpleRTF2HTMLConverter inserts too many <br/> tags

fadeyev opened this issue · comments

Not sure what is the purpose of the line 118 in SimpleRTF2HTMLConverter#fetchHtmlSection() :
html = html.replaceAll("[\\n\\r]+", " <br/> ");
However this results in whole lot of extra <br/> tags. And when trying to send an email with such HTML it looks awful with lots of extra lines. However when I replaced <br/> back with a newline \n and sent the email, it looked just like the original.
I tried this on about 10 different emails of various complexity and this replacement of newline with <br/> broke all of them completely, while removing this line fixed them to be just like the originals.

It's been a while since I looked at the RTF spec, but aren't newlines in RTF encoded with \n\r? That would mean they should be HTML newlines (br's) as well. I see this was in the original sources as well.

/edit Removing that line doesn't seem to cause issues for me in the unit test, but I remember having formatting issues with a chinese email... can't recall the details though, I think I'll remove it since you have more evidence to the contrary.

Released v1.3.0