pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Home Page:https://pymupdf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect handling of the link zoom parameter in link insertions

ikseek opened this issue · comments

Description of the bug

When I merge two PDF files with reproducer code provided, I get merged.pdf files with some links not working anymore.
To reproduce a bug, download basic-link-1.pdf and attachment-sample-1.pdf files and run provided script in a directory that contain them. Open produced merged.pdf.

Expected result:

  • merged pdf has all links working

Observed result:

  • links "Linking to an ID" and "Linking to a page number (page 2) and setting the display ratio (200%)" do not work in merged.pdf.

Checked in Mac OS Preview and Chromium PDF viewers.

basic-link-1.pdf
attachment-sample-1.pdf
merged.pdf

How to reproduce the bug

import fitz

out = fitz.open()
for file in "basic-link-1.pdf", "attachment-sample-1.pdf":
    out.insert_pdf(fitz.open(file))
out.save(filename="merged.pdf")

PyMuPDF version

1.24.1

Operating system

MacOS

Python version

3.12

File 'basic-link-1.pdf' contains a names dictionary (structure in the PDF catalog). Document-wide information like the names dictionary is not copied to the target PDF in method .insert_pdf() because this is a page-based method.
Named links in the source dictionary can thus not be copied - there is no internal link-kind-conversion like LINK_NAMED ==> LINK_GOTO.
So "Linking to an ID" is bound to fail.

So the remaining issue is the incorrect handling of the zoom value.
I therefore are taking the liberty to change the issue title accordingly.

Thanks @JorjMcKie!
Is there way to somehow convert this links manually via pymupdf?

You can walk through the named links of a page. Their dictionary items should contain all information you need to turn them into LINK_GOTO items. That one named items is

link0= {'kind': 4,
  'xref': 24,
  'from': Rect(56.69292068481445, 215.346435546875, 123.62651062011719, 225.346435546875),
  'page': 1,
  'to': Point(0.0, 813.54336),
  'zoom': 0.0,
  'nameddest': 'Link-01',
  'id': ''}

So you could define

link1= {'kind': fitz.LINK_GOTO, 'from': link0["from"], link0["page"], 'to': link0["to"]}
page.delete_link(link0)
page.insert_link(link1)