py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Home Page:https://pypdf.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Broken image extraction if no filters and CMYK colorspace

stefan6419846 opened this issue · comments

Image extraction is broken when isinstance(lfilters, NullObject) and mode == "CMYK" in

pypdf/pypdf/filters.py

Lines 818 to 826 in 0106904

else:
if mode == "":
raise PdfReadError(f"ColorSpace field not found in {x_object_obj}")
img, image_format, extension, invert_color = (
Image.frombytes(mode, size, data),
"PNG",
".png",
False,
)
as CMYK is not supported for PNG images.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.2.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader


reader = PdfReader('file.pdf')
for page in reader.pages:
    print(page)
    for key in page.images.keys():
        print(key)
        print(page.images[key])

An anonymized version of the file is out3.pdf.

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
    rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 876, in _xobj_to_image
    img.save(img_byte_arr, format=image_format)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
    save_handler(self, fp, filename)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
    raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
    rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/stefan/tmp/run.py", line 9, in <module>
    print(page.images[key])
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2420, in __getitem__
    return self.get_function(index)
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 501, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 880, in _xobj_to_image
    img.save(img_byte_arr, format=image_format)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
    save_handler(self, fp, filename)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
    raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG