cneud / alto-tools

Python tools for performing various operations on ALTO XML files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ALTO Tools

Python tools for performing various operations on ALTO XML files


Installation

You can install from PyPI by running

pip install alto-tools

or clone the repository, enter it and run

pip install .

Usage

alto-tools <INPUT> [OPTION] 

INPUT should be the path to an ALTO xml file or directory containing ALTO xml files.

The following OPTIONS are currently supported:

OPTION Description
-t --text Extract UTF-8 encoded text content
-c --confidence Extract mean OCR word confidence score
-i --illustrations Extract bounding box coordinates of <Illustration> elements
-g --graphics Extract bounding box coordinates of <GraphicalElement> elements
-s --statistics Extract statistical info (no. of textlines, words, glyphs etc.)

All output is sent to stdout.

About

Python tools for performing various operations on ALTO XML files

License:Apache License 2.0


Languages

Language:Python 100.0%