4pm-nomnom / OCR

Optical Character Recognition project in C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IDEAS - post-processing & pdf

Pierrick-MADE opened this issue · comments

post-processing :

  • char by char analyse
    • allow use to detect " as two '
  • word by word :
    • upper/lowercase check
    • spell check

pdf :

  • use pdftocairo -png file.pdf available on schools computers to convert pdfs
  • then use "convert -append in-*.png out.png" which is an imagemagick command
    (issue : max 5 input png for the command function)

Implementation done !

Issues :

  • processing time is really long
  • maximum of 5 pages at a time