ocr-box-editor-v2
- This project is modified from tesseract-web-box-editor. WordStr boxfile format is supported in this project
- This is a web application to generate training data for tesseract by the following steps
- Upload images
- Edit labels (text and bounding box coordinates) for the uploaded images
- Save images and corresponding labels to backend
- After we collect training data, we can retrain tesseract
prerequisite
- install
tesseract
- install
python3
andvirtualenv
How to install
virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt
How to run
python3 manage.py migrate
python3 manage.py runserver
db.sqlite3
will be created, and then, we can access http://127.0.0.1:8000
Editor
- Upload images and click
Process Image
button to generate default labels - Edit labels (text and bounding box coordinates) for the uploaded images
- Save images and corresponding labels (in
.box
file extension) to backend