filak / alto-tools

Python script for performing various operations on ALTO XML files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

alto-tools

image

Python3 script for performing various operations on ALTO files.

Usage

  • extract UTF-8 text content from ALTO file

    python3 alto-tools.py alto.xml -t

  • extract page OCR confidence score from ALTO file

    python3 alto-tools.py alto.xml -c

  • extract bounding boxes of illustrations from ALTO file

    python3 alto-tools.py alto.xml -l

Planned

  • write output to file(s) - currently all output is sent to stdout

    python3 alto-tools.py alto.xml [OPTION] -o

About

Python script for performing various operations on ALTO XML files

License:Apache License 2.0


Languages

Language:Python 100.0%