Data Splitting

Question

Data Splitting

shonenkov opened this issue 5 years ago · comments

Hello! Thank you all for great hard work especially for creating this dataset!

Could you provide data splitting from your article? https://arxiv.org/pdf/2008.05373.pdf
(image ids for Validation_images, Test_1_images and Test_2_images)

It would be very useful for comparison and publishing other articles and citing of your original article and dataset :)

Abdelrahman Abdallah · Answer 1 · Fri Jan 22 2021 21:51:36 GMT+0800 (China Standard Time)

thank you for your consideration,
we already did splitting for our dataset you can find folders for split data in the dataset folder
also, this link is an old version of my paper you can find the published version in this link https://www.mdpi.com/2313-433X/6/12/141

Should you require further assistance or have other queries, please do not hesitate to contact me.

Alex · Answer 2 · Fri Jan 22 2021 22:01:58 GMT+0800 (China Standard Time)

In the dataset folder I found this annotation for image from example "0_10_23.jpg":

{"size":{"width":517,"height":63},"moderation":{"isModerated":1,"moderatedBy":"Norlist","predicted":""},"description":"Слепым волчатам","name":"0_10_23"}

but here I didn't find information about splitting. could you help me to find it?

thank you in advance!

Abdelrahman Abdallah · Answer 3 · Fri Jan 22 2021 23:11:41 GMT+0800 (China Standard Time)

sure I will send python code to split the dataset. sorry, I thought it split in the upload folder but when I asked my supervisors. they told me it is not split because maybe researchers want to split it as they like. I will send you the link to split the dataset

Abdelrahman Abdallah · Answer 4 · Fri Jan 22 2021 23:16:03 GMT+0800 (China Standard Time)

this link you can use to split the dataset
https://github.com/bosskairat/Dataset

Alex · Answer 5 · Sat Jan 23 2021 00:05:58 GMT+0800 (China Standard Time)

thank you, I run your code and got this splitting:

HKR_splitting.csv.zip

Could you add this csv (after unzip) in Cloud for everyone ??? https://cloud.mail.ru/public/25xw/2YPdtaFAF

usage:

import pandas as pd

df_splitting = pd.read_csv('HKR_splitting.csv', index_col='id')
df_splitting['stage'].value_counts()
>>>
train    45559
val       9375
test2     5043
test1     4966
Name: stage, dtype: int64

thank you!

Abdelrahman Abdallah · Answer 6 · Sat Jan 23 2021 01:15:50 GMT+0800 (China Standard Time)

Check repository of python code we have already updated it