nguyenq / VietOCR3

Java GUI frontend for Tesseract OCR engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VietOCR

A Java GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract.

VietOCR is released and distributed under the Apache License, v2.0.

Features

  • Multi-platform
  • PDF, TIFF, JPEG, GIF, PNG, BMP image formats
  • Multi-page TIFF images
  • Screenshots
  • Selection box
  • File drag-and-drop
  • Paste image from clipboard
  • Text search and replace
  • Postprocessing for Vietnamese to boost accuracy rate
  • Vietnamese input methods
  • Localized user interface for many languages (Localization project)
  • Integrated scanning support
  • Watch folder monitor for support of batch processing
  • Custom text replacement in postprocessing
  • Spellcheck with Hunspell
  • Support for downloading and installing language data packs and appropriate spell dictionaries

Instructions

To launch the program from the command line:

java -jar VietOCR.jar

or for CLI option:

java -jar VietOCR.jar imagefile outputfile [-l lang] [--psm pagesegmode] [text|hocr|pdf|pdf_textonly|unlv|box|alto|tsv|lstmbox|wordstrbox] [postprocessing] [correctlettercases] [deskew] [removelines] [removelinebreaks]

Dependencies

About

Java GUI frontend for Tesseract OCR engine


Languages

Language:Java 73.4%Language:HTML 26.4%Language:Groovy 0.2%Language:Shell 0.1%Language:Batchfile 0.0%