c-jordi / pdf2data

A pdf segmentation and annotation tool for archival documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add all possible font types during feature extraction

lusamino opened this issue · comments

commented

So far, only a handful of font types (Calibri, Times New Roman, etc.) are considered for the extraction of features. However, we may encounter many different fonts. We need to implement the following:

  • During feature extraction, identify all possible font types
  • After processing all documents, run another step, over all the fonts encountered, to extract features (font occurrence) for all the types.

This feature will be part of the feature extraction process, contained in server/application/feature.