atlanhq / camelot

Camelot: PDF Table Extraction for Humans

Home Page:https://camelot-py.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error min() arg is an empty sequence when giving a table_area to camelot

GraceBouala opened this issue · comments

Hello,
I am trying to extract a table from a pdf file with camelot using the stream flavor and a specific table area. First, I used camelot read_pdf method with lattice flavor to get the table bounding box. Once this is done I call read_pdf again with stream flavor and the table_areas that I get from the first read_pdf call. However, I am getting a 'min() arg is an empty sequence ' error while there is indeed a table in that area and lattice is even extracting that table. Can someone help me fix that issue? Bellow is my code

import camelot
tables = camelot.read_pdf(pdf,pages='2')
bbox = list(tables[0].dict['_bbox'])
bbox = [str(elt) for elt in bbox]
interested_area = ','.join(bbox)
output_camelot = camelot.read_pdf(
pdf,
pages='2',
flavor='stream',
split_text=True,
table_areas = [interested_area]
)

Hi GraceBouala,
Please verify pdf table, it might be a scanned/image table.

Regards,
Anand