Return all the text on a page not working with `page.text()`
joelostblom opened this issue · comments
Joel Ostblom commented
Thanks for maintaining this library! I have been looking for a way to extract text and annotations from PDFs and this might be it!
I have run into a problem when trying to extract the text from a page. According to the QtPoppler manual page.text()
should return the entire page's text when invoked without arguments (at least I believe that is what "If rect is null, all text on the page is given" means). However, when I try this I get an error:
import popplerqt5
doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
page.text()
Out:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Page.text(): arguments did not match any overloaded call:
overload 1: not enough arguments
overload 2: not enough arguments
This is the test file test1.pdf. How can I return all the text on a page?
Joel Ostblom commented
Apparently, and empty QRectF
if what the docs refer to with "if rect is null". The following returns all the text on a page:
import popplerqt5
from PyQt5 import QtCore
doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
print(page.text(QtCore.QRectF()))