pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Home Page:https://pymupdf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

find_tables doesn't recognize any table in scanned document

RodrigoTomeES opened this issue · comments

Description of the bug

Hi,

We are trying to get the tables of this scanned document as this but find_tables doesnt recognize any one. We tried to use also page.find_tables(horizontal_strategy="text", vertical_strategy="text") as mention here.

How to reproduce the bug

import fitz  # import package PyMuPDF
import os

# Open some document, for example a PDF (could also be EPUB, XPS, etc.)
doc = fitz.open("RE2.pdf")

for page in doc:
  # Look for tables on this page and display the table count
  tabs = page.find_tables()

  for table in tabs.tables:
    print(table.to_pandas())
  print(f"{len(tabs.tables)} table(s) on {page}")

# We will see a message like "1 table(s) on page 0 of input.pdf"

PyMuPDF version

1.24.2

Operating system

Linux

Python version

3.10

As answered in Discord: no such can-do!