HemingwayLee / ocr-box-editor-v2

django ocr python tesseract

ocr-box-editor-v2

This project is modified from tesseract-web-box-editor. WordStr boxfile format is supported in this project
This is a web application to generate training data for tesseract by the following steps
- Upload images
- Edit labels (text and bounding box coordinates) for the uploaded images
- Save images and corresponding labels to backend
After we collect training data, we can retrain tesseract

prerequisite

install tesseract
install python3 and virtualenv

How to install

virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt

How to run

python3 manage.py migrate
python3 manage.py runserver

db.sqlite3 will be created, and then, we can access http://127.0.0.1:8000

Editor

Upload images and click Process Image button to generate default labels
Edit labels (text and bounding box coordinates) for the uploaded images
Save images and corresponding labels (in .box file extension) to backend

Data Viewer

We can see all uploaded images in the backend by clicking Data tab

About

django ocr python tesseract

Languages

Language:JavaScript 50.2%Language:Python 25.8%Language:HTML 22.1%Language:CSS 1.3%Language:Shell 0.6%