angelodel80 / hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML 2.0/2.1/3/4 using XSL stylesheets

The XSLT scripts use XSLT 2.0 features, so they require an XSLT 2.0 capable transformer, like Saxon.

See ocr-fileformat for an interface to using these stylesheets.

hOCR-spec https://github.com/kba/hocr-spec

File naming scheme: sourceFormatVersion__targetFormatVersion.xsl

CONTENTS

About

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

License:MIT License


Languages

Language:XSLT 100.0%