The rotation configuration set to IndirectObject, which is preventing the PDF from being uploaded.
zzhangyun opened this issue · comments
Describe the bug
The rotation configuration for the PDF file is set to IndirectObject(12, 0, 4419697344). When uploading this file, it reports below error:
2024-07-19 14:40:21,146 - werkzeug - INFO - 10.131.0.1 - - [19/Jul/2024 14:40:21] "POST /api/v1/project/6698c33997df8cbbfd8770c0/document/?collection=6698c40f97df8cbbfd8770c1 HTTP/1.0" 500 -
Traceback (most recent call last):
File "/app/server/service/pdf_svc.py", line 767, in get_doc_pages_raw_data
for num_page_index in range(0, len(pdf.pages)):
File "/usr/local/lib/python3.9/site-packages/pdfplumber/pdf.py", line 142, in pages
p = Page(self, page, page_number=page_number, initial_doctop=doctop)
File "/usr/local/lib/python3.9/site-packages/pdfplumber/page.py", line 226, in init
self.rotation = _rotation % 360
TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'
Have you tried repairing the PDF?
No
Code to reproduce the problem
Paste it here, or attach a Python file.
PDF file
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.
Expected behavior
What did you expect the result should have been?
Actual behavior
What actually happened, instead?
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
- pdfplumber version: [e.g., 0.5.22]
- Python version: [e.g., 3.8.1]
- OS: [e.g., Mac, Linux, etc.]
Additional context
Add any other context/notes about the problem here.
Hmmmm, as far as I'm aware, IndirectObject(12, 0, 4419697344)
is not a valid value for a PDF's rotation. Given that the PDF loads without without problems when using pdfplumber.open(path, repair=True)
, I'm closing this issue, but feel free to continue the discussion here.
Generally any value in a PDF can be given either by a direct or indirect object with some exceptions explicitly mentioned in the spec.
For the page rotation no restriction is mentioned in the spec, so it may be indirect.
Thank you, @mkl-public and my apologies @zzhangyun; I misunderstood the issue, thinking that the value was literally set to IndirectObject(12, 0, 4419697344)
. This should now be fixed in c20cd3b.