pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Home Page:https://pymupdf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing letters in saved pdf document.

wz93672 opened this issue · comments

Description of the bug

I have two problems with pdf documents which I’m getting from outside and don’t have influence on documents creation. This one is missing letters in pdf opened and saved with pymupdf. There was some inconsistency with replication, but looks like “clean = True” option for save method is triggering the bug. In the sample it is visible in red headline.

image

default options_first save.pdf
test1_clean-True_first save.pdf

How to reproduce the bug

with fitz.open('default options_first save.pdf') as doc:
   doc.save('test1_clean-True_first save.pdf', clean=True)

PyMuPDF version

1.24.1

Operating system

Windows

Python version

3.11

This is an error in the base software.
It also is exhibited when using MuPDF's CLI tool:

mutool clean -s default.options_first.save.pdf
warning: dropping unclosed PDF processor
warning: ... repeated 2 times...

I have submitted a report in its issue system with a link to here.
MuPDF bug report link: https://bugs.ghostscript.com/show_bug.cgi?id=707727

This has been fixed in v1.24.2.