Datasets

By using the following link, you can download the Nancho dataset that contains the documents and their corresponding labels. We also cropped the document character in the subfolders.

https://drive.google.com/file/d/1iFB-9zEtmB3bjbO77gTKIkTLhdSEeBFD/view?usp=sharing

The Academy of Korean Studies provided small-sized Nancho dataset for research on the translation of ancient cursive Korean archives into the modern Korean language. The visually similar features make it difficult for model to distinguish the samples with high commonalities. The documents have image degradation including document aging and issues with the quality of the ink such as ink dispersion due to the passage of time. Moreover, in some cases, the documents are low quality because they date back several hundred years ago. Because of the passage of time, the ink has been dispersed all over the edges, which makes them difficult to read. These highly degraded samples lower the recognition performance. The samples include all sorts of disturbances, including ink dispersion, extensive cursive styles, low-quality resolution, and complex backgrounds. The Nancho dataset is small-sized dataset and the samples are segmented directly from documents coming from a variety of source scripts.

AI-repo / Datasets

Datasets

About