Adding dataset EpiSearch (Astori’s letters)
federico-boschetti opened this issue · comments
Hello ! [We are glad to send you the metadata related to the dataset described in https://doi.org/10.5281/zenodo.7719291]
Here is our dataset YAML file:
schema: https://htr-united.github.io/schema/2022-04-15/schema.json
title: EpiSearch HTR
url: https://github.com/vedph/episearch-htr
authors:
- name: Lorenzo
surname: Calvelli
orcid: 0000-0002-0920-9156
roles:
- project-manager
- name: Tatiana
surname: Tommasi
orcid: 0009-0000-2815-0113
roles:
- transcriber
- name: Federico
surname: Boschetti
orcid: 0000-0002-7810-7735
roles:
- support
institutions: []
description: Ground Truth for Astori’s letters (see the README.md file for details)
project-name: EpiSearch
project-website: https://github.com/vedph/episearch-htr
language:
- ita
production-software: eScriptorium + Kraken
script:
- iso: Latn
script-type: only-manuscript
time:
notBefore: '1705'
notAfter: '1709'
hands:
count: '1'
precision: exact
license:
- name: CC-BY-SA 4.0
url: https://creativecommons.org/licenses/by-sa/4.0/
format: Alto-XML
volume:
- metric: files
count: 34
Hello @federico-boschetti!
Thank you for your contribution! I made #122 to add the dataset description to the catalog.
I have two questions regarding the dataset:
-
I saw that some lines are not segmented or transcribed. It's not a problem, but I just wanted to make sure it is intentional.
-
regarding the organization of the repository, I think it would be easier to users if you put all the JPEG and the XML files in a
data/
folder, in stead of having them all at the root level. (like what we suggested in the template). Do you think you could do this ?
Otherwise, as far as the description is concerned, it's all good for merging
Hello @alix-tz !
Thank you for your feed-back.
- Omissions are intentional (introductory formulae and signatures were over-represented and lowered the performance of the training);
- I created the "data" directory and I filled it with images and XML files, as you suggested.
Awesome! I just confirmed the addition of the description of the dataset to the catalog.
Thank you again!