frescobaldi / python-poppler-qt5

Python binding to libpoppler-qt5

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Return all the text on a page not working with `page.text()`

joelostblom opened this issue · comments

Thanks for maintaining this library! I have been looking for a way to extract text and annotations from PDFs and this might be it!

I have run into a problem when trying to extract the text from a page. According to the QtPoppler manual page.text() should return the entire page's text when invoked without arguments (at least I believe that is what "If rect is null, all text on the page is given" means). However, when I try this I get an error:

import popplerqt5

doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
page.text()

Out:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Page.text(): arguments did not match any overloaded call:
  overload 1: not enough arguments
  overload 2: not enough arguments

This is the test file test1.pdf. How can I return all the text on a page?

Apparently, and empty QRectF if what the docs refer to with "if rect is null". The following returns all the text on a page:

import popplerqt5
from PyQt5 import QtCore

doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
print(page.text(QtCore.QRectF()))