py-pdf / fpdf2

Simple PDF generation for Python

Home Page:https://py-pdf.github.io/fpdf2/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Write_html does not apply line spacing within lists

pechiyappan opened this issue · comments

write_html() does not space lines properly in lists, even when line height is specified explicitly. Spacing is affected between the list items and for multiple lines within a single list item.

The earlier version 2.7.5 did not have this issue.

Example:

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.write_html("""<p line-height=1.5>
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</li>
</ul>
</p>
""")

pdf.output("issue_1116.pdf")

Hi @pechiyappan

Thank you for reporting this.

I'm not 100% sure to understand the problem you describe.
I can confirm that the HTML line-height property is ignored, when used in the same way as your example.
However you wrote that "write_html() does not space lines properly in lists, even when line height is specified explicitly".
So that means that there is ANOTHER issue, even when line-height is not used?
If that's true, could you please give a little more details about this: what kind of spacing logic do you expect?

So that means that there is ANOTHER issue, even when line-height is not used?

I understand @pechiyappan to mean that the vertical distance between <li> items does not use line-height.
I'm not entirely sure if it should (technically this is a distance between paragraphs), but it's possible that it does so in HTML.
If so, then maybe we should look at the question in a more general sense: Does line-height affect the distance between other types of paragraphs in HTML as well?

Note that we use a rather simple HTML parser, which also ignores many other attributes, or may not respect inherited properties and other complications. Maybe paragraph spacing can be improved across the board at this opportunity, but figuring that out will take some more research.

My expectation is that, when specified, line-height should apply to spaces between different list elements and within list elements (when they are long and run for multiple lines). This used to be the case as of version 2.7.5.

When line-height is not specified, spacing can revert to the default expected behavior.

My expectation is that, when specified, line-height should apply to spaces between different list elements and within list elements (when they are long and run for multiple lines). This used to be the case as of version 2.7.5.

Even if this used to be the case in a previous version of fpdf2, this does not conform to the HTML standard at all.
If you render the HTML snippet you provided in a browser, you'll observe that a line-height attribute on a parent <p> tag is ignored.
First off, because in HTML, a paragraph cannot contain a list.
But a line-height attribute directly on a <ul> tag is also ignored.

It is however valid, in both HTML & fpdf2, to introduce <br> elements between <li> list items to introduce vertical spacing.

Does that solution satisfy your requirement @pechiyappan?

Thank you @Lucas-C

Unfortunately, it does not. I have tried introducing line break elements to introduce vertical spacing, but it does not look as elegant as the line-height option. As a user, I would prefer the line-height option to be available in paragraph or list elements. Please feel free to close this if it cannot be implemented due to HTML standards.

This used to be the case as of version 2.7.5.

This happens to be incorrect.
Up to 2.7.5, list items simply always had an empty line in between, independently of any line-height settings.

The only tag that ever honored a line-height attribute was <p>, and it still does. Which is really weird, because there is no such HTML attribute to be found anywhere in the standards. line-height is only valid as a CSS property, which we don't support at all.

Please feel free to close this if it cannot be implemented due to HTML standards.

Indeed. Sorry for that.

"This happens to be incorrect."

I can verify that it is not incorrect. I am able to generate documents with specified line heights within lists using the old version.

I can verify that it is not incorrect. I am able to generate documents with specified line heights within lists using the old version.

I agree with @pechiyappan on that.
I tested his code snippet with fpdf2 v2.7.5 and the line spacing (inside AND outside <li> elements) was clearly dependent on the line-height attribute: https://github.com/py-pdf/fpdf2/blob/2.7.5/fpdf/html.py#L379

I tested his code snippet with fpdf2 v2.7.5 and the line spacing (inside AND outside <li> elements) was clearly dependent on the line-height attribute

According to the linked code, that could only happen when the <p> element with the line-height attribute was not explicitly closed before the <ul>/<li>. In other words, it was a bug where the properties of the paragraph could leak outside of its scope.

But then, accepting a line-height attribute for any HTML element is already incorrect. There is no such attribute in HTML that I can find in the specification, and we should not invent our own HTML variety. In other words, the current support in <p> tags should also go away.

If anyone wants to create a PR that parses inline CSS, then that would be the appropriate way to solve this issue (we already do that in a very rudimentary form in the SVG parser). Doing this correctly will be non-trivial, though. You'll have to decide how to handle (or possibly ignore) inherited values. Another challenge will be to recognize when tags with no explicit end tag go out of scope. I'm not sure if our current linear parsing model is adequate for that. But if you manage to get it working anyway, more power to you!