PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

表格识别模型PubTabNet_2.0.0_train.jsonl缺失

s957995299 opened this issue · comments

请问,该去哪里下载paddle提供的PubTabNet_2.0.0_train.jsonl ?
没有这个文件的话,无法按照OCR十讲提供的表格识别模型教程训练
https://aistudio.baidu.com/bd-cpu-01/user/995689/3481601//files/train_data%2Ftable%2Fpubtabnet%2FPubTabNet_2.0.0_train.jsonl?download=1这个网页打不开,不让下载
image

在这里https://github.com/ibm-aur-nlp/PubTabNet

您好,请问PubTabNet官网下的数据的标注需要特殊处理吗?我自己split之后,怎么训练acc都是0

不需要特别处理,slipt为训练验证集就行了,一般1个epoch acc会到10%多