tonghe90 / textspotter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training data format?

gitUserGoodLeaner opened this issue · comments

Hi, I still not clear what's the training dataset anotation format? Could you mind share it?

Hi @gitUserGoodLeaner , details can be refer to our paper. The basic information here is VGG800K synthetic is used with both box and character annotation and transcription. ICDAR13 and ICDAR15 can be used for training text spotting with boxes annotation and transcription.

@tonghe90 Thanks for your reply, I had read this part, however while I start to training the model, I find the above description not enough, for example, usually recognition model training require dictionary file, was this project needed too? If it did, when should we provide it to model?

@gitUserGoodLeaner . The dictionary is only used for reference during the test process.

@tonghe90 Thanks, another question: what's the meaning of sample_gt_cont in det_nms_layer of test_iou.pt https://github.com/tonghe90/textspotter/blob/master/models/test_iou.pt#L7013 ?

How to store the string type of data into blob, for example the gt text label? @tonghe90