ENHANCEMENT idea - refactor so can be used as a library as well as command line

Question

ENHANCEMENT idea - refactor so can be used as a library as well as command line

clach04 opened this issue 2 years ago · comments

I.e. so import txt2pdf is an option.

clach04 · Answer 1 · Sat Apr 08 2023 07:52:18 GMT+0800 (China Standard Time)

I'm doing this today (well for the last year or so?) with code like the below:

try:
    #raise ImportError
    from io import BytesIO as FakeFile  # py3
except ImportError:
    try:
        from cStringIO import StringIO as FakeFile  # NOTE only use when not using .write() method - does not support Unicode
    except ImportError:
        from StringIO import StringIO as FakeFile


import txt2pdf  # from https://github.com/baruchel/txt2pdf - NOTE requires https://github.com/baruchel/txt2pdf/pull/25
from txt2pdf import parser, Margins, PDFCreator
....
    args = txt2pdf.parser.parse_args([
                    source_file_name, '--encoding=' + source_encoding,
                    '--output=' + intermediate_pdf_file_name,
                    '--media=letter',   # default 2cm margins
                    '--author=Actian',  # NOTE this ends up in intermediate pdf file, later stage can/will change this
                    '--quiet',
                    '--tab-size=4',

                    #'--font=\\Downloads\\courier-prime-code\\ttf\\CourierPrimeCode-FF.ttf',  # 
                    #'--font=\\Downloads\\courier-prime-code\\ttf\\CourierPrimeCode-symbol-Form-Feed.ttf',  # 

                    #'--break-on-blanks',  # THIS DOES NOT WORK - proven with EA 2.1 code - I'm not convinced NOT using this works properly, looks like the logic for page-break support and --minimum-page-length is mixed up (and I don't think it works correctly)
                    #break blanks also not working. TODO prefix input text file with line numbers and visually inspeact with both options
                    #setting breakonly emmits 17 pages!?
                    #'--break-on-blanks',  # do not force a page break on form-feed character ## DEBUG getting 17 pages not 2976 for EA 2.1 code!?

                    # '--minimum-page-length' defaults to 10

                    #u'--tab-replacement=\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}',  # DEBUG
                    #u'--tab-replacement=\N{RIGHTWARDS ARROW}',  # DEBUG  # requires https://github.com/baruchel/txt2pdf/pull/27
                    ])
    # TODO title
    if intermediate_pdf_file_name == ':memory:':
        # special case, don't write to filesystem
        pdf_file_object = FakeFile()
        args.output = pdf_file_object

    print('Reading plain text file %s, using encoding %s' % (source_file_name, source_encoding))
    print('Generating PDF of all pages, %s' % intermediate_pdf_file_name)
    pdf_creator = PDFCreator(args, Margins(
        args.margin_right,
        args.margin_left,
        args.margin_top,
        args.margin_bottom))
    # hard coded character replacement, could instead use command line flags and a filename
    # NOTE requires https://github.com/baruchel/txt2pdf/pull/29
    pdf_creator.character_replacement = {
        "12": ""  # form-feed/0x0c/\f replaced with empty string (i.e. remove form-feeds)
    }
    try:
        pdf_creator.generate()
    except ......

I cheat slightly and make use of the argument parsing in txt2pdf already to feed that into PDFCreator().