cs-chan / Total-Text-Dataset

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to parse the annotation file?

vinayakarannil opened this issue · comments

Do you have any script how to parse the Polygon ground truth file?

No we don't have any specific script but the format (both .txt and .mat) are clearly written under the groundtruth folder. It should be pretty easy to be parsed into your desired format. Do let us know if you need further assistance.

Thank you for replying. I am struggling to parse the files as it is not in any standard formats like json or xml. I think i will have to use some regex to parse the files

If you happened to use Python or Matlab for your application, you can refer to the scripts below on how to parse our annotation. Hope it helps!

Python - https://github.com/cs-chan/Total-Text-Dataset/tree/master/Evaluation_Protocol/Python_scripts
Matlab - https://github.com/cs-chan/Total-Text-Dataset/tree/master/Evaluation_Protocol

No we don't have any specific script but the format (both .txt and .mat) are clearly written under the groundtruth folder. It should be pretty easy to be parsed into your desired format. Do let us know if you need further assistance.

thanks for your sharing,
and i want to ask what is the meaning of the number of the mat format file,
such as
https://github.com/cs-chan/Total-Text-Dataset/blob/master/Evaluation_Protocol/Examples/Groundtruth/poly_gt_img1.mat