danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!

Home Page:https://danfickle.github.io/pdf-templates/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rendering a w3cdom document : infinite loop creation of TableCellBox

AlexisCothenet opened this issue · comments

Hello,

I found an OOM but cannot understand the reason. It seems there is a cascade of TableCellBox created using this html (i tried to keep it small but i seems the number of td inside the first tr is mandatory and the 2 others tr as well...) :

String bodyhtml=
                "<table style=\"border-collapse:separate;border:none;padding:0;margin:0;table-layout:fixed;width:711px\" width=\"711\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n" +
                        "<tbody>\n" +
                        "<tr style=\"height:1px\">"+
                        "<td style=\"border:none;padding:0\" width=\"91\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"45\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"1\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"1\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"75\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"20\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"52\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"55\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"74\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"15\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"2\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"74\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"21\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"87\"></td>"+
                        "</tr>" +
                        "<tr style=\"height:4px\">" +
                        "<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\" colspan=\"2\" rowspan=\"2\"> </td>" +
                        "</tr>" +
                        "<tr style=\"height:34px\">" +
                        "<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\"> </td>" +
                        "</tr>" +
                        "</tbody></table>";
Document doc = Jsoup.parse(htmContent);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "");
builder.toStream(outStream);
builder.run();

The version of htmltopdf used is 1.0.2 (jsoup 1.13.1).

Here is the snapshot of the profiler heap dump analysis.
OOM_htmltopdf

Hi @AlexisCothenet,

This bug is very concerning as it involves text breaking. I was able, after much trial and error, to reduce your test case to the following (no Jsoup needed):

<table style="width: 3px;table-layout: fixed;">
<tr>
 <td colspan="2"></td>
 <td style="word-wrap: break-word;">ABC</td>
</tr>
</table>

Now I have narrowed it down to fixed table layout with colspan (or rowspan) and break-word, I'll try to find the root cause and fix it.

As always, thanks for reporting.

hi @danfickle , it continue to loop inside https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/InlineBoxing.java#L160 .

It continue to try to handle the "ABC" string.

lbContext.isFinished() is never finised / lbContext.getStartSubstring().length() is never 0.

Well, this is embarrassing...

It turns out that replicating is simple as:

<div style="width: 0; word-wrap: break-word;">ABC</div>

Ie. Any zero width box with content and break-word will trigger it. This is a significant bug so I'll try to do a release soon with the fix. In the meantime, avoid break-word or make sure you do not have any boxes with zero width (calculated or explicit) such as in tables.

And yes, I should have tested this edge case when implementing break-word.

Thanks everyone.

Hello @danfickle ,

Is a release is planned soon for this problem ?
Thank you.