jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.

Home Page:https://borbpdf.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUG - Issues converting HTML to PDF

benninkcorien opened this issue · comments

Describe the bug

I'm trying to convert a local HTML file (of a yearly planner, messy work in progress: python generated html files with css for styling and layout ) to PDF.

I'm getting all kinds of errors
SVG unsuported. If I delete all SVGs I think there's a problem with the
Is this something that should work with borb? I haven't found a good solution/package yet that can convert HTML to CSS with all the layout/styling in place.

To Reproduce
Steps to reproduce the behaviour:

save the attached .txt as .html in a "testfiles" folder. Create a "borb" folder on the same level. run python:

import os
import glob

from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit.export.html_to_pdf.html_to_pdf import HTMLToPDF


html_dir = "testfiles"
pdf_dir = "borb"

os.makedirs(pdf_dir, exist_ok=True)


for html_file in glob.glob(os.path.join(html_dir, "*.html")):
    base_name = os.path.basename(html_file).split(".")[0]
    pdf_file = os.path.join(pdf_dir, f"{base_name}.pdf")
    print(f"Working on {base_name}")

    html_str: str = ""
    with open(html_file, "r", encoding="utf-8") as md_file_handle:
        html_str = md_file_handle.read()

    doc: Document = HTMLToPDF.convert_html_to_pdf(html_str)
    assert doc is not None

    with open(pdf_file, "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

Expected behaviour
A PDF file from the HTML file, preferably one that takes all the CSS into account as well (and maybe SVG support, though I can easily replace those with JPG/PNG or something)

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Win11, python 3.12
  • borb version : 2.1.20

planner_full_result.txt

Hi,

The class HTMLToPDF simply isn't capable of handling complex HTML. It also doesn't handle CSS.

In your usecase I would opt for removing the man in the middle. Simply generate the daily planner page from scratch in borb.

Kind regards,
Joris Schellekens