altoxml / schema

ALTO XML schema - latest and all former versions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow variants for hyphen elements (HYP)

splet opened this issue · comments

Separated from #26 (comment) (introduction of glyphs)

I am no languistical expert to say, if there is any language existing, where different characters are used as hyphen to need to outline possible alternatives. Anyway I personally think it is again a very rare case. But I see no problem nor conflict with other alternatives or similar. So for me it would be fine to extend the HYP as well.

In older prints (mostly C16-C18), often variations of hyphens or the "double oblique hyphen" Unicode U+2E17 have been used as a hyphen. Since Unicode also distinguishes at least a dozen different hyphens, I would say it makes sense to allow variants for hyphens too.

clipboard01

Good point as use-case, even I would expect to have the OCR result corrected consistent for such material then to equal sign and prevent alternatives. ;-D