Ground truth for the digitized historic collections of Universitätsbibliothek Mannheim.
The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).
After exporting the transcriptions as PAGE XML files, those without any transcription were removed, and empty lines in the remaining ones were removed, too.:
# Remove PAGE XML files without any transcription.
rm -v $(grep -L "<Unicode>..*</Unicode>" *.xml)
# Remove empty lines in PAGE XML files.
perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml
- PPN477366015 – Historia de vita et actis ... Martini Lutheri ...
- PPN477380670 – Der Prophet Habacuc
- PPN477396054 – Deudsche Messe vnd ord=nung Gottes diensts
- PPN477396569 – Auff des koenigs zu En=gelland ...
- PPN506281272 – Bambergische halszgerichts ordenung
- PPN1807526488 – Ioannis Lodovici Vivis Von vnderweÿsung ayner Christlichen Frauwen/ Drey Bücher
- PPN1807527700 – Ioannis Lodovici Vivis Von Gebirliche[m] Thun vnd Lassen aines Ehemanns
- PPN1837705283 – Francisci Gvyeti Andegavi Monobiblos siue generosae poesews specimen (1602)
- PPN1885309457 – Lopodunum - Ladenburg 98 bis 1898 (1900)
- PPN1890801038 - La Morale De Confucius Philosophe De La Chine (1688)