ibm-aur-nlp / PubTabNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some empty cell labels are None, while others are spaces

Antonio-hi opened this issue · comments

I have noticed that some empty cell labels are not same.

  • Example:
    • 'PMC5697367_003_01.png' it contains an tokens like this {'tokens': [' ']} which is corresponding to the empty cells in the image
    • But in another case ‘PMC5822612_003_00.png’ the empty cells are labeled as {'tokens': []}

I have counted the whole training set, and found that the ratio of the both cases is 307594: 1203641

I hope to know if there is anything wrong with my understanding or the annotation have this ambiguity

From my perspective, they have almost the same visualization effect. I think this is why there are two patterns to label the blank cells. And I counted the number of cell number( which in fact means the number of chunks) and the structure number (which in fact means the number of cells) and found they are the same. So I think if you would like to unify the representation of blank cells you can simply add a" " into those {'tokens': []}

close this work