GeReV / hocr-editor-ts

A visual hOCR file editor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hOCR Editor

NOTE: This project has been superseded by a C# rewrite: https://github.com/GeReV/HocrEditor

This is a visual hOCR file editor with built-in Tesseract functionality, built using React.

This project is still early work-in-progress.

hOCR Editor Screenshot

Features

  • Drag & drop interface for restructuring or resizing of sentences, paragraphs and blocks.
  • Optional automatic resizing of elements to wrap their contents.
  • Optional automatic cleanup of empty elements.
  • Import and export of hOCR documents.
  • Built-in OCR, using Tesseract, of one or more documents or regions of documents.

Rationale

While working on a personal project, I went looking for an editor which would allow me to correct some of the mistakes made by Tesseract in magazine scans with mixed language (Hebrew and English in my case).

After going through the list of applications mentioned in the Tesseract docs, I couldn't find a free and open-source and convenient enough application which allowed for both correcting recognition errors and restructuring the document features.

Tesseract, on occasion, leaves text lines in separate paragraphs to where one would expect them, and occasionally fails to correctly recognize letters or words, especially in mixed-language sentences.

This application was designed to help reorganize the structure of documents and help edit recognized words using easy-to-use visual representation of the document.

Available Scripts

This project was bootstrapped with Create React App.

In the project directory, you can run:

yarn start

Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.

The page will reload if you make edits.
You will also see any lint errors in the console.

yarn build

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

About

A visual hOCR file editor

License:MIT License


Languages

Language:TypeScript 95.0%Language:SCSS 2.3%Language:CSS 1.5%Language:HTML 1.2%