Shape Robust Text Detection with Progressive Scale Expansion Network
A reimplement of PSENet with mxnet-gluon. Just train on ICPR.
- Support TensorboardX
- Support hybridize to depoly
- Fast, 45ms/per_image when we resize max_side to 784
Thanks for the author's (@whai362) great work!
OpenCV 4+ (for c++ version pse)
To reimplement PSENet by Gluon, here are some problem that I occur.
Diceloss about kernels isn't convergence.
- First, I doubt the label about kernel is not correct. However, I verify them again so that they are absolute right.
- Second, I doubt the
mx.nd.splitcannot be backwarded. However the diceloss about score map by
splitis well. So it cannot be raise this problem.
- Here the network is based on resnet50, and the output of FPN is input_size/4,so there may not be any text instance in min_kernel_map. So I set the number of kernels to 3
Maybe upsampling output to input_size is a good choice. I will try it in my spare time.
- gluoncv_model_zoo:resnet50_v1b, you can replace it with others，the default path of pretrained-model in
Also you can download maskrcnn_coco from
gluoncv_model_zoo to get a warm start.
cd pse make
Here I add
-Wl,-undefined,dynamic_lookup to avoid some compile error, which is different from original PSENet.
python scripts/train.py $data_path $ckpt
data_path: path of dataset, which the prefix of image and annoation must be same, for example, a.jpg, a.txt
ckpt: the filename of pretrained-mdel
|Text loss||Kernel loss||All_loss||Pixel_accuracy|
python eval.py $data_path $ckpt $output_dir $gpu_or_cpu
- Upsamping to input_size
- Train on ICDAR and evaluate