pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Home Page:https://pymupdf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

drawings.get_drawings() reports incorrect value for width attribute

JustADetailer opened this issue · comments

Description of the bug

drawings.get_drawings() returning a 'l' type object will report a width value that appears incorrect. For instance a line with a width of 7 will return a width of 1.6799999475479126. It may be that this is using a different unit but I don't know what it would be.

How to reproduce the bug

drawing_list=drawings.get_drawings()
for item in drawing_list:
try:
if item['items'][0][0] == 'l':
line_type, p0, p1 = item['items'][0]
lx0, ly0 = p0
lx1, ly1 = p1
line_length = item['line_length']
#print(item['width'])
#print(item.width)

            # Filter lines by length (adjust the length threshold as needed)
            if item['width']:
                if line_length > 36 and item['width']>1.5:
                    filtered_lines.append(item)
            else:
                #print('Skipped')
                pass
        
    except ValueError as e:
        print(f"Unexpected data format: {item['items'][0]}, error: {str(e)}")
return filtered_lines

PyMuPDF version

1.23.8 or earlier

Operating system

Windows

Python version

3.12

You forgot to supply a file showing the problem!

A Single Line of Width 7.pdf
Sorry my browser was crashing.

You are wrong:

page=doc[0]
page.read_contents()
b'/GS0 gs 0 0 0 RG 1 J 1 j 7 w .24 0 0 .24 0 0 cm\n4492 5557 m 3363 5557 l S\n'

This shows:

  • a width parameter w with value 7
  • a scaling matrix cm with values x=0.24, y=0.24
  • the effective line width is correctly computed as 0.24 * 7 = 1.68.