baruchel / txt2pdf

Text to PDF converter with Unicode support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ENHANCEMENT idea - support file objects, not just filenames

clach04 opened this issue · comments

Would depend on #23 being implemented.

Support file objects for:

  • input text file, filename param
  • output PDF file, output param - NOTE reportlab Canvas already supports file objects

NOTE even with #25 (a quick fix for #23), more refactoring is required to allow file objects to be passed in.

One quick hack would be to instantiate PDFCreator object, then update the object to add file for input reading and/or output file object before calling generate() which would not require a major refactoring.

Output PDF, via file object (like) is possible via:

from io import BytesIO
....
pdf_file_name = ':memory:'
args = txt2pdf.parser.parse_args([source_file_name, '--encoding=' + source_encoding, '--output=' + pdf_file_name, '--media=letter', '--author=me', '--quiet'])
pdf_file_object = FakeFile()
args.output = pdf_file_object
PDFCreator(args, Margins(
    args.margin_right,
    args.margin_left,
    args.margin_top,
    args.margin_bottom)).generate()
pdf_file_object.getvalue()  # to get PDF bytes

Input file needs workaround _readDocument() and _process().
Main changes needed are that _process() data parameter today needs to be a file as lookup is done on the file number and then the file size.

Proposal

Instead of:

def _process(self, data):
    flen = os.fstat(data.fileno()).st_size

allow file length to be passed in, if omitted do existing file length check:

def _process(self, data, flen=None):
    flen = flen or os.fstat(data.fileno()).st_size

@baruchel any strong (negative ;-)) thoughts on this?

Not urgent, this came up as part of the investigation whilst debugging a bug (in the file IO code).