c-jordi / pdf2data

A pdf segmentation and annotation tool for archival documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add even more hand crafted features

lusamino opened this issue · comments

commented

Despite we will add more complex models to extract features, perhaps add the following extra, to each of the tree levels

  • Pages: percent of page covered, normalized page number
  • Textbox: TBD
  • Lines: TBD