py-pdf / fpdf2

Simple PDF generation for Python

Home Page:https://py-pdf.github.io/fpdf2/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get rid of FPDF class/instance attributes getting passed to HTML2PDF.

gmischler opened this issue · comments

Problem
While writing tests for #1170 , @lcgeneralprojects has stumbled across a rather obscure and dubious "feature" of write_html().
Apparently since release 2.4.2, added in this commit, it is possible to feed write_html() keyword arguments by adding class or instance attributes to FPDF() itself before calling the method.
At a time where write_html() only had two or three arguments, this may have looked like a good idea, but polluting the namespace of class A in order to feed initialization arguments to class B doesn't seem desirable to me under any circumstances. I'm also unable to imagine a use case where this would be more practical than simply passing those arguments directly, or to keep them in a dictionary if they need to be collected first.

Solution
Let's get rid of this "feature".
Fpdf2 is getting increasingly complex, and we really need to try hard to keep the namespaces of our various modules as clean as possible. Encouraging users to cross-pollute those namespaces is simply bad design. Without any documentation (and even if it was documented), there's also a risk that a user subclassing FPDF() and adding their own attributes might inadvertedly cause changes in the behaviour of write_html().

Additional context
Other than the commit message "made <li> bullets & indentation configurable through class attributes, instance attributes or optional method arguments", I can find no documentation at all about how this is supposed to be used, and why it would be useful.

Apparently since release 2.4.2, added in this commit, it is possible to feed write_html() keyword arguments by adding class or instance attributes to FPDF() itself before calling the method.

Yes, I made that commit! 😅

I'm also unable to imagine a use case where this would be more practical than simply passing those arguments directly, or to keep them in a dictionary if they need to be collected first.

The intent was to avoid repeatedly passing the same arguments to .write_html() when the end user application relies mostly on this method, with fixed values for some arguments.

Encouraging users to cross-pollute those namespaces is simply bad design.

In restropect, you are probably right 🙂

Other thoughts:

  • while I agree with the intent to get rid of this "feature", we must be very aware that removing it may break some end user code. In this case, IMHO, I think this is worth the risk.

  • as an alternative to this soon-to-be-deprecated approach, overriding write_html() in a subclass is another simple solution to avoid repeating arguments passed to .write_html(), and we probably use this opportunity to document it in docs/HTML.md:

class PDF(FPDF):
    def write_html(self, text, *args, **kwargs):
        super().write_html(text, *args, table_line_separators=True, warn_on_tags_not_matching=True, **kwargs)

The intent was to avoid repeatedly passing the same arguments to .write_html() when the end user application relies mostly on this method, with fixed values for some arguments.

Packing them in a dict and supplying that with **kwargs is much simpler and less prone to surprises. Overriding write_html() or replacing FPDF.HTML2FPDF_CLASS with a curried version of HTML2PDF() are two other, but slightly more involved ways.
Of course, pointing out those options in the documentation a bit more explicitly would certainly be helpful.

we must be very aware that removing it may break some end user code.

This is not actually documented in any way, not even in the docstring. I doubt that anyone other than you (and now @lcgeneralprojects) has ever used it. End users relying on entirely undocumented "features" live risky to begin with.

Agreed!

We should just not forget to mention this change in the CHANGELOG.md 🙂

Whilst I do not disagree with the sentiment.

I doubt that anyone other than you (and now @lcgeneralprojects) has ever used it.

I would not be suprised, I at least have been using this.

class MyHTML2FPDF(HTML2FPDF):

    def __init__(self, *args, **kwargs):
        kwargs["ul_bullet_char"] = "\u2022"

        ...
        
        kwargs["table_line_separators"] = True

        super().__init__(*args, **kwargs)
        
        ...

I would not be suprised, I at least have been using this.

What you're doing there is one of the suggested alternatives, and perfectly fine.

The "feature" we're about to eliminate won't affect you at all.