danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!

Home Page:https://danfickle.github.io/pdf-templates/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Surrogate characters are decoded wrongly in makeJustificationArray

EmanuelCozariz opened this issue · comments

Given the following string 𧙗, this will be encoded as '\uD85D\uDE57'

The above string will be accepted by the font CODE2002.ttf

PDFont font = PDType0Font.load(doc, new File("CODE2002.ttf"));
cs.showText("\uD85D\uDE57");

But it is incorrectly decoded.

Method makeJustificationArray of PdfBoxFastOutputDevice
uses Character.toString(c) to add to the data array

uD85D => Character.toString(c) will decode as �
uDE57 => Character.toString(c) will decode as �

Hi @EmanuelCozariz,

You're right, the justification code was not surrogate pair aware. I have made the fix but have not added a test as fonts with surrogate pair coverage tend to be too large to add to the repository.

Hopefully, time permitting, you could download the repository and test with your use case before the next release, which should be soon?

Anyway, thanks again for reporting and debugging this issue.

I think this is fixed. Please feel free to re-open if required. Release soon.

Do you have a release timeline? We are blocked by this issue at the moment. Thank you.

Hopefully on the weekend (Sunday 29 Nov) if no blocking issues come up.