francescozanoni / text-hodgepodge

Text manipulation scripts, OCR, and so no...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text-hodgepodge

Text manipulation scripts, OCR, and so on...

Required tools:

  • OCRmyPDF: add text to PDF
  • Tesseract OCR: extract text from images
    • TessData: Tesseract language files (1)
  • pdftotext: extract text from PDF, extract images from PDF
    • install with chocolatey install xpdf-utils
  • ImageMagick: manage images
  • PDFtk: merge, split and edit bookmarks of PDF

(1) to be copied to folder C:\Program Files\Tesseract-OCR\tessdata

About

Text manipulation scripts, OCR, and so no...

License:MIT License


Languages

Language:PowerShell 95.3%Language:Shell 4.7%