poppler/error: Failed to parse XRef entry [11].poppler/error: Top-level pages object is wrong type (null)
juanfrilla opened this issue · comments
Receiving this error on this url:
poppler/error: Failed to parse XRef entry [11].poppler/error: Top-level pages object is wrong type (null)
https://www.asamblea.gob.sv/sites/default/files/documents/decretos/6BD1CFE2-9948-4D32-A45D-92FF50D15C0A.pdf
And with this code:
import io
import requests
import pdftotext
url = "https://www.asamblea.gob.sv/sites/default/files/documents/decretos/6BD1CFE2-9948-4D32-A45D-92FF50D15C0A.pdf"
content = requests.get(url).content
pdf = pdftotext.PDF(io.BytesIO(content))
i'm using poppler-utils-0.26.5-43.el7.1.x86_64
pdftotext version 0.26.5
on a centos server, I don't know If I need to upgrade poppler. Is there anything I can do without upgrading poppler?
Or Is there a way of catching this poppler error and skip the PDF that gives that error
Is there a way of catching this poppler error and skip the PDF that gives that error
Sure, you can include exception handling:
import io
import requests
import pdftotext
url = "https://www.asamblea.gob.sv/sites/default/files/documents/decretos/6BD1CFE2-9948-4D32-A45D-92FF50D15C0A.pdf"
content = requests.get(url).content
try:
pdf = pdftotext.PDF(io.BytesIO(content))
except pdftotext.Error as exception:
# Do whatever you want here
print(f"I couldn't open that PDF: {exception}")