ibm-aur-nlp / PubTabNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The y coordinate value of cell bbox seems to be inaccurate

qiyuhou opened this issue · comments

Thank you for providing the large-scale dataset.

When converting the html to a kind of split structure, I found the y coordinate value of cell bbox seems to be inaccurate.

eg. PMC5842743_009_00, which is a 11x6 table.
PMC5842743_009_00
A03 line: [2, 65, 19, 76], [31, 65, 46, 76], [68, 65, 82, 76], [110, 65, 133, 76], [165, 65, 176, 76], 211, 65, 228, 76]
A04 line: [2, 78, 20, 89], [31, 78, 46, 89], [71, 75, 79, 90], [118, 75, 125, 90], [167, 75, 174, 90], [216, 75, 223, 90]
Obviously y1 of the upper cell is greater than y0 of the lower cell ( 76 > 75 ).
PMC5842743_009_00

I randomly checked 100 tables in training set and discovered 37 instances have this peculiarity.

Thanks