danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!

Home Page:https://danfickle.github.io/pdf-templates/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Thread is stuck during HTML to PDF conversion with nested tables for fairly large HTML

swillis12 opened this issue · comments

Update 9/23/20: The first two HTML files I attached are misleading and work as expected. I found that the problem is actually occurring only when the height/max-height of the div enclosing the table is set to "auto":
image

@syjer Try this new HTML file attached, it should reproduce the issue for you as well. Sorry about the confusion with the previous test cases. FYI running this in my minimal test application using this HTML file was 90,087ms which is consistent with the results I see inside of my program.
smaller_test_auto_height.txt

Original Issue/info below, the stack trace thread dump info applies to the "smaller_test_auto_height.txt" test case since my proram was using this "auto" height all along (which I wasn't aware of 🤦 ):

Currently our HTML target is email, so we are using a lot of HTML table elements, nested and everything is inline styles. I suspect that I have a thread hanging due to the nested tables. Unfortunately I'm not sure what I can do to work around this issue. I've tried "table-layout: fixed" and assigning column widths as well.

Update: made sure that it is valid XHTML as well using https://validator.w3.org/: problem_html1.txt

Here is also a smaller test html that is still painfully slow (~1.5-2 minutes). It is the same content, just reduced the number of table rows to 71 for easier visibility:
smaller_test.txt

If I enable logging I see infinite messages as follow:

com.openhtmltopdf.cascade FINEST:: min-height, relative= 0.0 (0), absolute= 0.0 using base=460.0
com.openhtmltopdf.cascade FINEST:: text-indent, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: height, relative= 23.0 (23px), absolute= 460.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-height, relative= 0.0 (0), absolute= 0.0 using base=460.0
com.openhtmltopdf.cascade FINEST:: text-indent, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: height, relative= 23.0 (23px), absolute= 460.0 using base=0.0

Thread dump shows this stack:

"http-bio-8080-exec-10" #204 daemon prio=5 os_prio=31 tid=0x00007fa152c59000 nid=0x13803 runnable [0x0000700014582000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.juli.ClassLoaderLogManager.getLogger(ClassLoaderLogManager.java:229)
        - locked <0x00000006c01af2f0> (a org.apache.juli.ClassLoaderLogManager)
        at java.util.logging.LogManager.demandLogger(LogManager.java:551)
        at java.util.logging.Logger.demandLogger(Logger.java:455)
        at java.util.logging.Logger.getLogger(Logger.java:502)
        at com.openhtmltopdf.util.JDKXRLogger.getLogger(JDKXRLogger.java:103)
        at com.openhtmltopdf.util.JDKXRLogger.isLogLevelEnabled(JDKXRLogger.java:75)
        at com.openhtmltopdf.util.XRLog.log(XRLog.java:122)
        at com.openhtmltopdf.util.XRLog.log(XRLog.java:113)
        at com.openhtmltopdf.css.style.derived.LengthValue.calcFloatProportionalValue(LengthValue.java:204)
        at com.openhtmltopdf.css.style.derived.LengthValue.getFloatProportionalTo(LengthValue.java:80)
        at com.openhtmltopdf.css.style.CalculatedStyle.getFloatPropertyProportionalTo(CalculatedStyle.java:437)
        at com.openhtmltopdf.css.style.CalculatedStyle.getMinHeight(CalculatedStyle.java:1174)
        at com.openhtmltopdf.render.BlockBox.getCSSMinHeight(BlockBox.java:1628)
        at com.openhtmltopdf.render.BlockBox.applyCSSMinMaxHeight(BlockBox.java:1172)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1074)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.layout(PdfBoxRenderer.java:344)
        at com.openhtmltopdf.pdfboxout.PdfRendererBuilder.run(PdfRendererBuilder.java:41)
...

Question is -- what exactly is the issue and what can I do to work around this if I am pretty much stuck with the current layout? It may not be feasible to get rid of the table nesting.

hi @swillis12 ,

I'll have a look using your example (thank you!) in the profiler, maybe there is some obvious fix that can be done.

Which version are you using?

With 1.0.4, with the smaller_test.txt example, it takes 1.2 seconds to run.

Using the following code:

package test;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

public class App {
    public static void main(String[] args) throws Exception {

        long start = System.currentTimeMillis();
        try (OutputStream os = new FileOutputStream("test.pdf")) {
            PdfRendererBuilder builder = new PdfRendererBuilder();

            builder.useFastMode();
            builder.withFile(new File("smaller_test.html"));

            builder.toStream(os);
            builder.run();
        }
        System.err.println((System.currentTimeMillis() - start) + "ms");
    }
}

I got 1218ms

Thanks @syjer. Yes I am on 1.0.4. That is strange.. it is much slower on mine. The only difference is that I am reading the HTML from a String in memory rather than from a file like you are doing.

Update: I tried it with your code and am seeing the same results. It is very fast (I got 2685ms on the larger HTML file)! This is good news, but now I have to figure out what is going on in my program. This is the code I'm using by the way:

                PdfRendererBuilder builder = new PdfRendererBuilder();
		builder.useFastMode();
		builder.withHtmlContent(html, "");
		builder.toStream(out);
		builder.run();

It may be:

  • memory related, too much pressure on the GC and thus a lot of pauses
  • something logging related? Are you using the slf4j support? I think we can still improve this part to reduce the amount of generated "garbage" if the log is not used.

To be noted, I think we can still improve the performance :).

It may be:

  • memory related, too much pressure on the GC and thus a lot of pauses
  • something logging related? Are you using the slf4j support? I think we can still improve this part to reduce the amount of generated "garbage" if the log is not used.

This is very possible. Yes I currently have the SLF4J jar added -- How do you recommend testing what you mention about reducing the amount of "garbage" generated by the logging? Should I just exclude the SLF4J jar? Note: it doesn't seem to be making that many log statements, just what's below:

com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.parsers.SAXParser
com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.parsers.SAXParser
com.openhtmltopdf.load.xml-entities INFO:: Entity public: -//W3C//DTD XHTML 1.1//EN, no local mapping. Returning empty entity to avoid pulling from network.
com.openhtmltopdf.load INFO:: Loaded document in ~45ms
com.openhtmltopdf.load INFO:: TIME: parse stylesheets 59ms
com.openhtmltopdf.match INFO:: media = print
com.openhtmltopdf.match INFO:: Matcher created with 162 selectors
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for overflow at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for max-height at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for overflow at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () Value for padding must be a length or percentage at line 1. Skipping declaration.

How do you recommend testing

As a first start, I would add the following flag to the jvm: -XX:+PrintGC or -XX:+PrintGCDetails so you can see if the issue is the GC.

You will see during the execution some lines like:

[GC (Allocation Failure) [PSYoungGen: 64512K->10746K(75264K)] 64512K->12319K(247296K), 0.0072593 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

If they appear too often and with quite a lot of time, then it could be the issue: maybe not enough memory is given to the java process or some kind of memory leak is happening.

Alternatively you can use visualvm, for visualizing the gc activity. With it you can also do a first profiling of the application (cpu or memory) and try to pin down the main root cause.

edit: for reducing the amount of garbage: well, first we need to identify what could be the issue in this library :)

I'm reading the HTML content from a string in memory as well as writing the outputstream to memory (browser Response).. So when I get a chance I'll do a quick test and first write the HTML content to a file and then have the library write it to a file as well.

Thanks for the help so far @syjer I'll try this out once I get a little more time. What do you recommend I do with this issue for now?

@swillis12 , you can keep the issue open, maybe somebody else has/had the same issue and could provide additional feedback.

I'm still thinking how it would be possible to have this much difference in execution time (1-2minutes vs few seconds), even in a case of GC issue, I don't think it would be that bad.

To be noted, we have another issue of slow generation time: #506 but this one seems to be more font related

Thanks again @syjer. I ran a quick test below and got the same slow results.. I'll have to do some GC profiling as you described. I wish I could help you reproduce it :).

Yes I had scoured the issue tracker for mentions of slowness and did come across that one. I also want to note that I'm seeing the same behavior using FlyingSaucer (I swapped this library for FS to run a quick test). It seems to be equivalently slow.

                 Path f = Files.createTempFile("temp", ".html");
                 Files.write(f, html.getBytes(), StandardOpenOption.APPEND);
                 File f2 = f.toFile();
		f2.deleteOnExit();
		
		File f1 = File.createTempFile("output", ".pdf");
		PdfRendererBuilder builder = new PdfRendererBuilder();
		builder.useFastMode();
		builder.withFile(f2);
		builder.toStream(new FileOutputStream(f1));
		builder.run();
		Path path = f1.toPath();
		Files.copy(path, out);
		out.flush();

btw, beware when using deleteOnExit , as it may slowly leak memory because the jvm need to keep track of what files need to be removed on exit. See this sonar rule: https://rules.sonarsource.com/java/RSPEC-2308 :)

Better to delete in a finally block :)

(obviously, if it's a short running process, it's not an issue ;))

@syjer I was away last week but got another chance to look at this. My test case that I originally attached is not reproducing the actual issue. If you look at my update, the issue is seen only when the enclosing div has auto height. You can try it out and hopefully see what I am talking about.

P.S. I tested with the latest code including your change #552. So it doesn't seemed to have improved it much. It seems maybe related to a memory leak in CSS calculations when the table height is auto. Notice the constant GC (purple line) since the new generation space is filling up over and over again:

Screen Shot 2020-09-23 at 2 15 09 PM

hi @swillis12 , thank you for updating the example 👍 .

I'll have a look most likely friday.

hi @swillis12 , with the new template, I'm able to reproduce the reported issue.

Now, for finding the main issue, it will not be easy :).

It does not seems to be a memory issue, but something in the layout algorithm that cause to spend quite a lot of time.

the main culprit seems to be:

tbody {
        page-break-inside: avoid;
      }

removing this css rule generate the pdf in 1.4~ seconds on my pc for the file smaller_test_auto_height.txt.

edit: most likely the page break avoiding algorithm is sub optimal and spiral out of control when you have a lot of children elements.

Looking at https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/BlockBoxing.java#L40 , I've got the impression the rule is dropped too late, maybe due to the peculiarity of this page (deep hierarchy).

Most likely @danfickle has a better idea than me :), I'll still try on my side to find a solution though.

Even more interesting, the issue appear only when the first tbody has the css rule applied. If you apply the page-break-inside:avoid only to the most inner tbody, it work without problem :).

I think I'll be able to condense the issue in a more compact html file.

Good catch! Thanks for the workaround @syjer. Hopefully this helps others that may run into this too.

@syjer do you think #506 is perhaps the same issue that we're seeing here? I noticed that test case also has "page-break-inside: avoid" rule applied to the body element (long table inside this like mine):

image

@swillis12 it may have a role, but I think in the #506 issue, the main cause is more inside pdfbox.

I've cut a little bit the problematic file:

issue-551-page-break-inside-avoid.txt

It's quite clear we have O(n²) algorithm, or maybe even exponential, as it depend on how deep it need to check.

I'm struggling with PDF generation where it goes out of control and eventually locks things up using this:

.page {
   page-break-after: always;
}

Would that be the same culprit as avoid setting? Any news on a fix? I'm using v1.0.9

Hi @chubbard,

I recently refactored the page-break related code. However, it is not released and is part of the footnotes work in #711. You could try building and using that branch to see if it fixes your issue.

P.S. There was an n squared algorithm in BlockBoxing that I replaced with the use of a TreeMap.

hi @danfickle , I've tried the issue-551-page-break-inside-avoid.txt file with the current issue_364_footnotes branch, but it currently still trigger the issue (waited more than 2 minutes, still not finished).