abdoelsayed2016 / HKR_Dataset

Handwritten Kazakh and Russian (HKR) database for text recognition

Home Page:https://doi.org/10.1007/s11042-021-11399-6

Repository from Github https://github.comabdoelsayed2016/HKR_DatasetRepository from Github https://github.comabdoelsayed2016/HKR_Dataset

Data Splitting

shonenkov opened this issue · comments

commented

Hello! Thank you all for great hard work especially for creating this dataset!

Could you provide data splitting from your article? https://arxiv.org/pdf/2008.05373.pdf
(image ids for Validation_images, Test_1_images and Test_2_images)

It would be very useful for comparison and publishing other articles and citing of your original article and dataset :)

thank you for your consideration,
we already did splitting for our dataset you can find folders for split data in the dataset folder
also, this link is an old version of my paper you can find the published version in this link https://www.mdpi.com/2313-433X/6/12/141

Should you require further assistance or have other queries, please do not hesitate to contact me.

commented

In the dataset folder I found this annotation for image from example "0_10_23.jpg":

{"size":{"width":517,"height":63},"moderation":{"isModerated":1,"moderatedBy":"Norlist","predicted":""},"description":"Слепым волчатам","name":"0_10_23"}

but here I didn't find information about splitting. could you help me to find it?

thank you in advance!

sure I will send python code to split the dataset. sorry, I thought it split in the upload folder but when I asked my supervisors. they told me it is not split because maybe researchers want to split it as they like. I will send you the link to split the dataset

this link you can use to split the dataset
https://github.com/bosskairat/Dataset

commented

thank you, I run your code and got this splitting:

HKR_splitting.csv.zip

Could you add this csv (after unzip) in Cloud for everyone ??? https://cloud.mail.ru/public/25xw/2YPdtaFAF

usage:

import pandas as pd

df_splitting = pd.read_csv('HKR_splitting.csv', index_col='id')
df_splitting['stage'].value_counts()
>>>
train    45559
val       9375
test2     5043
test1     4966
Name: stage, dtype: int64

thank you!

Check repository of python code we have already updated it