Request: Dataset and pretrained model for language detection
turian opened this issue · comments
Joseph Turian commented
MOTIVATION
Language detection from images is relatively difficult. Adobe and ABBYY OCR require you already know the language of the document before you start OCR.
REQUEST
- Please use your document generator to generate documents in different languages.
- Ideally, you would even mix different languages.
- Release a pretrained model that estimates the percentage of each language in a particular document image.