HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents

Home Page:https://htr-united.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add "file" counts for a few datasets

alix-tz opened this issue · comments

@PonteIneptique do you have any objection to adding the following informations in the catalog:

Dataset name (xml) file count
Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X 201
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) 283
The POPP datasets 235
Eutyches 129
FoNDUE-GasparoSardiToponomasia-Dataset 49
FoNDUE Spanish chapbooks 19th c. Dataset 198
Éditer la correspondance de Constance de Salm (1767-1845) 45
Jeu de données OCR - Incunables sévillans 1494-1500 62
Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923) 169

I went through each of these repositories to count the number of XML files corresponding to ground truth. Note that for "Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the PAGE files (all the ALTO files have a PAGE equivalent, which is not true the other way around). Same for "Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923)".

If we add these metrics, we would have the "file" metric available for every dataset currently listed in the catalog.