jalan / pdftotext

Simple PDF text extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not exactly an issue

sheikgit opened this issue · comments

Hi! Is there any documentation on (if possible) hot to pass the command line parameters to "pdf = pdftotext.PDF(f)"?

I could not find documentation, and even reading the code is not easy to determine if you can pass parameters...

Thanks in advance,

                   Adrián

There are docs attached to the module, available in the usual way. The available parameters are listed there:

$ python
>>> import pdftotext
>>> help(pdftotext.PDF)

Help on class PDF in module pdftotext:

class PDF(builtins.object)
 |  PDF(pdf_file, password="", raw=False, physical=False)
 |  
 |  Args:
 |      pdf_file: A file opened for reading in binary mode.
 |      password: Unlocks the document, if required. Either the owner
 |          password or the user password works.
 |      raw: If True, page text is output in the order it appears in the
 |          content stream.
 |      physical: If True, page text is output in the order it appears on
 |          the page, regardless of columns or other layout features.
 |  
 |      Usually, the most readable output is achieved by using the default
 |      mode, rather than raw or physical.
 |  
 |  Example:
 |      with open("doc.pdf", "rb") as f:
 |          pdf = PDF(f)
 |      for page in pdf:
 |          print(page)
 |  
 |  Methods defined here:
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.