UB-Mannheim / BeTrial

Bernoulli trial generator to validate OCR results

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

other input formats than XML

imlabormitlea-code opened this issue · comments

Hei!
I wondered if it would be possible to insert other input formats than XML (plain text especially).
Greetings

Hello @imlabormitlea-code ,
you could certainly generate a betrial html page from image-text line pairs. But if you only have a plain fulltext page (containing several lines), the user would always have to see the whole image page, which I don't see as a big advantage to doing it manually. Is there a concrete use case?

If you want to determine the CER on plain fulltext pages, I would recommend correcting a few selected pages manually and then using a programme like ocreval or dinglehopper. This way you would always have the chance to determine the CER again when you optimise your workflow.