nguyenq / VietOCRwpf

.NET WPF GUI frontend for Tesseract OCR engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VietOCRwpf

A .NET GUI frontend for Tesseract OCR engine written using WPF. Supports optical character recognition for Vietnamese and other languages supported by Tesseract.

VietOCRwpf is released and distributed under the Apache License, v2.0.

Features

  • PDF, TIFF, JPEG, GIF, PNG, BMP image formats
  • Multi-page TIFF images
  • Screenshots
  • Selection box
  • File drag-and-drop
  • Paste image from clipboard
  • Postprocessing for Vietnamese to boost accuracy rate
  • Vietnamese input methods
  • Localized user interface for many languages (Localization project)
  • Integrated scanning support
  • Watch folder monitor for support of batch processing
  • Custom text replacement in postprocessing
  • Spellcheck with Hunspell
  • Support for downloading and installing language data packs and appropriate spell dictionaries

Instructions

The program can run as Windows or console application.

For CLI option:

vietocr imagefile outputfile [-l lang] [--psm pagesegmode] [text|hocr|pdf|pdf_textonly|unlv|box|alto|page|tsv|lstmbox|wordstrbox] [postprocessing] [correctlettercases] [deskew] [removelines] [removelinebreaks]

Dependencies

About

.NET WPF GUI frontend for Tesseract OCR engine


Languages

Language:C# 68.2%Language:HTML 31.7%Language:Shell 0.0%