altomator / ALTO-IIIF

Extracting illustrations from ALTO documents with IIIF

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extracting illustrations from ALTO files with IIIF

Synopsis

Extracting illustrations described in OCRed documents (ALTO format) with IIIF API.

Full presentation in French

Installation

You will need 4 scripts :

  1. filterIMG.sh (shell)
  2. processURLs.pl (Perl)
  3. extractIMG.pl (Perl)
  4. extractMD.pl (Perl)

A batch.sh script chains the commands.

The documents must be stored in a "DOCS" folder. The images will be generated in a "IMG" folder. The metadata will be generated in a "MD" folder.

Tests

  1. Open a command line terminal.
  2. filterIMG.sh

  3. perl processURLs.pl illustrations.txt

  4. perl extractIMG.pl illustrations.txt_URL 200 -- minimal size in Ko of the extracted images

  5. perl extractMD.pl illustrations.txt_URL

License

CC0

CC0

About

Extracting illustrations from ALTO documents with IIIF


Languages

Language:Perl 89.1%Language:Shell 10.9%