UB-Mannheim / digi-gt

Ground truth for the digitized historic collections of UB Mannheim

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

digi-gt

Ground truth for the digitized historic collections of Universitätsbibliothek Mannheim.

The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).

After exporting the transcriptions as PAGE XML files, those without any transcription were removed, and empty lines in the remaining ones were removed, too.:

# Remove PAGE XML files without any transcription.
rm -v $(grep -L "<Unicode>..*</Unicode>" *.xml)
# Remove empty lines in PAGE XML files.
perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml

List of transcriptions

Links

About

Ground truth for the digitized historic collections of UB Mannheim

License:Creative Commons Zero v1.0 Universal


Languages

Language:Shell 100.0%