FPDFRecorder frequent usage makes document generation slow

Question

FPDFRecorder frequent usage makes document generation slow

dmail00 opened this issue 6 months ago · comments

It is not immediately apparent to me why the rewind on FPDFRecorder copies the singleton FPDF state, was it imagined that this would be rewound multiple times?

From looking at the commit history of this file, it was introduced around version 2.3; the initial commit [1] does not contain this deepcopy yet strangely the next commit to this file is modifying an already present deepcopy [2].

The reason for asking this, is that I am generating 30 or so documents at a time with a similar format, each with five heading and this is slow, it is copying the singleton state twice for each header which seems like madness. The speed can be drastically improved by removing the second deepcopy or at least using a boolean flag to copy or not; however it can be even better using some logic that at least fits my needs. This is:

cache the current font
set the heading font as the current
get the width of the text for the heading
if this greater than the width minus the margins* then hand off to start_section
else get the height of the font line* and add the height of four lines using the previous font**
** if this new y would exceed the page break trigger create a page
** render the marked sequence
reset the old font

* margins includes any possible heading borders.
** Anything less than four lines, to me at least, does not make sense in putting a header on this page

[1] ea834ec
[2] 9fb2c3b

Lucas Cimon · Answer 1 · Mon Jan 15 2024 16:51:59 GMT+0800 (China Standard Time)

Hi @dmail00

It is not immediately apparent to me why the rewind on FPDFRecorder copies the singleton FPDF state, was it imagined that this would be rewound multiple times?

Yes indeed, there is even a unit test for that: https://github.com/py-pdf/fpdf2/blob/2.7.7/test/test_recorder.py#L30

The reason for asking this, is that I am generating 30 or so documents at a time with a similar format, each with five heading and this is slow, it is copying the singleton state twice for each header which seems like madness. The speed can be drastically improved by removing the second deepcopy or at least using a boolean flag to copy or not; however it can be even better using some logic that at least fits my needs.

Great if you found some solution that fits your needs 🙂

I think it'd be great to improve fpdf2 performances for as many use cases as possible, as long as we do not lose functionalities.

It is not immediately clear to me what is your use case there: could you provide some minimal script that reproduces the performance issue you mentioned, please? This would help us to understand through which methods calls the FPDFRecorder() is intensively used in your situation (e.g. is it reached through FPDF.offset_rendering() or FPDF.unbreakable()? I'd guess it's the former one from FPDF.start_section()), and also provide a "baseline" use case that we could work on, to improve its speed.

Also, if you already found a solution for your problem, what is your intent in opening this issue?
Would you like to contribute on improving things in fpdf2? 🙂

Lucas Cimon · Answer 2 · Mon Jan 15 2024 17:00:25 GMT+0800 (China Standard Time)

If performances are better, I think that instead of calling .offset_rendering() in .start_section() we could copy the approach used in Table._get_row_layout_info(), that is to call .multi_cell() a first time with output=MethodReturnValue.PAGE_BREAK inside a pdf._disable_writing() context-block.

dmail00 · Answer 3 · Tue Jan 16 2024 02:22:48 GMT+0800 (China Standard Time)

If performances are better, I think that instead of calling .offset_rendering() in .start_section() we could copy the approach used in Table._get_row_layout_info(), that is to call .multi_cell() a first time with output=MethodReturnValue.PAGE_BREAK inside a pdf._disable_writing() context-block.

This works really well, thanks and is just as fast as what I was doing. I did change it to return the height output=MethodReturnValue.HEIGHT as I still think having spaces for lines after the heading makes sense.

It is not immediately clear to me what is your use case there: could you provide some minimal script that reproduces the performance issue you mentioned, please?

It would have to be a contrived example, as what is currently written is pulled in from an external source and contains personal information.

Also, if you already found a solution for your problem, what is your intent in opening this issue?

It was solely to make you aware of the problem and provide a potential solution.

Lucas Cimon · Answer 4 · Wed Feb 28 2024 18:38:28 GMT+0800 (China Standard Time)

I opened PR #1127 to improve this