Request: Dataset and pretrained model for language detection

Question

Request: Dataset and pretrained model for language detection

turian opened this issue 4 months ago · comments

MOTIVATION

Language detection from images is relatively difficult. Adobe and ABBYY OCR require you already know the language of the document before you start OCR.

REQUEST

Please use your document generator to generate documents in different languages.
Ideally, you would even mix different languages.
Release a pretrained model that estimates the percentage of each language in a particular document image.