PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fcenet error

justcodew opened this issue · comments

想要fcenet来测试一下弯取文本的检测效果,数据集为icdar2015时,训练正常。但换成 ICDAR2019-ArT数据集时,
训练会报如下的错误:
error happened with msg: Traceback (most recent call last):
File "/home/code/PaddleOCR/ppocr/data/simple_dataset.py", line 136, in getitem
outs = transform(data, self.ops)
File "/home/code/PaddleOCR/ppocr/data/imaug/init.py", line 50, in transform
data = op(data)
File "/home/code/PaddleOCR/ppocr/data/imaug/fce_targets.py", line 662, in call
results = self.generate_targets(results)
File "/home/code/PaddleOCR/ppocr/data/imaug/fce_targets.py", line 649, in generate_targets
polygon_masks_ignore)
File "/home/code/PaddleOCR/ppocr/data/imaug/fce_targets.py", line 608, in generate_level_targets
level_img_size, lv_text_polys[ind])[None]
File "/home/code/PaddleOCR/ppocr/data/imaug/fce_targets.py", line 333, in generate_center_region_mask
center_line = (resampled_top_line + resampled_bot_line) / 2
ValueError: operands could not be broadcast together with shapes (3,2) (2,2)

ICDAR2019-ArT数据集标注有可视化看过,没有问题。估计是代码这里有点问题,貌似是从mmocr转过来的。

可以帮忙看一下吗

能否提供越界的具体case

/train/img/gt_1807.jpg [{"transcription": "**少林武术培训", "points": [[26, 445], [95, 330], [231, 248], [417, 248], [515, 311], [614, 467], [542, 497], [491, 403], [397, 338], [274, 336], [186, 398], [104, 491]], "language": "Chinese", "illegibility": false}, {"transcription": "报名热线: 15221362588", "points": [[165, 455], [481, 462], [482, 489], [165, 485]], "language": "Chinese", "illegibility": false}, {"transcription": "021-20983256", "points": [[307, 490], [481, 490], [484, 513], [309, 512]], "language": "Latin", "illegibility": false}, {"transcription": "少林武术", "points": [[335, 658], [364, 655], [373, 779], [342, 783]], "language": "Chinese", "illegibility": false}]
gt_1807

收到

你好,这张case,我这边测试正常的
image

我用的就是这张图片,测试时有时正常,有时会报错。测试代码如下

import numpy as np
from paddle.io import  DataLoader, BatchSampler
from ppocr.data.simple_dataset import SimpleDataSet
from ppocr.utils.loggers import VDLLogger
import yaml
import random
import paddle
from ppocr.utils.logging import get_logger

def set_seed(seed=1024):
    random.seed(seed)
    np.random.seed(seed)
    paddle.seed(seed)

def build_dataloader(mode, batch_size, drop_last, shuffle, num_workers):
    # dataset = SimpleDataSet(config,mode, logger, seed=0.5)
    dataset = SimpleDataSet(config,mode, logger,seed=None)
    batch_sampler = BatchSampler(dataset=dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
    data_loader = DataLoader(dataset=dataset, batch_sampler=batch_sampler, num_workers=num_workers, return_list=True, use_shared_memory=False)
    return data_loader

if __name__ == '__main__':

    save_model_dir = './test_data/'
    loggers = VDLLogger(save_model_dir)
    log_file = './test_data/test_data_0607.log' 
    logger = get_logger(log_file=log_file)

    config_file = r'/home/justcodew/code/PaddleOCR/configs/det/det_r50_vd_dcn_fce_art_error.yml'
    config =  yaml.load(open(config_file, 'rb'), Loader=yaml.Loader)
    seed = config['Global']['seed'] if 'seed' in config['Global'] else 1024
    set_seed(seed)

    train_dataloader = build_dataloader('Train', batch_size=1, drop_last=False, shuffle=False, num_workers=1)

    for indx, data in enumerate(train_dataloader):
        print('indx ',indx)

配置文件:

Global:
 use_gpu: true
 epoch_num: 1500
 log_smooth_window: 20
 print_batch_step: 200
 save_model_dir: ./output/det_fce/det_r50_dcn_fce_art/
 save_epoch_step: 1000
 eval_batch_step: [0, 200]
 cal_metric_during_train: False
 pretrained_model: ./pretrain_models/resnet50_pretrain_models/ResNet50_vd_ssld_pretrained
 checkpoints:
 save_inference_dir: ./output/det_fce/det_r50_dcn_fce_0620_art/
 use_visualdl: True
 infer_img: doc/imgs_en/img_10.jpg
 save_res_path: ./output/det_fce/predicts_fce.txt


Architecture:
 model_type: det
 algorithm: FCE
 Transform:
 Backbone:
   name: ResNet
   layers: 50
   #dcn_stage: [False, True, True, True]
   out_indices: [1,2,3]
 Neck:
   name: FCEFPN
   out_channels: 256
   has_extra_convs: False
   extra_stage: 0
 Head:
   name: FCEHead
   fourier_degree: 5
Loss:
 name: FCELoss
 fourier_degree: 5
 num_sample: 50
 
Optimizer:
 name: Adam
 beta1: 0.9
 beta2: 0.999
 lr:
   learning_rate: 0.0001
 regularizer:
   name: 'L2'
   factor: 0

PostProcess:
 name: FCEPostProcess
 scales: [8, 16, 32]
 alpha: 1.0
 beta: 1.0
 fourier_degree: 5
 box_type: 'poly'

Metric:
 name: DetMetric
 main_indicator: hmean

Train:
 dataset:
   name: SimpleDataSet
   data_dir: /data/justcodew/ocr_data/icdar_convert/icdar_2019_art/ 
   label_file_list: 
      - /data/justcodew/ocr_data/icdar_convert/icdar_2019_art/ppocr_train_label_error.txt 
   transforms:
     - DecodeImage: # load image
         img_mode: BGR
         channel_first: False
         ignore_orientation: True
     - DetLabelEncode: # Class handling label
     - ColorJitter: 
         brightness: 0.142
         saturation: 0.5
         contrast: 0.5
     - RandomScaling: 
     - RandomCropFlip:
         crop_ratio: 0.5
     - RandomCropPolyInstances:
         crop_ratio: 0.8
         min_side_ratio: 0.3
     - RandomRotatePolyInstances:
         rotate_ratio: 0.5
         max_angle: 30
         pad_with_fixed_color: False
     - SquareResizePad:
         target_size: 800
         pad_ratio: 0.6
     - IaaAugment:
         augmenter_args:
           - { 'type': Fliplr, 'args': { 'p': 0.5 } }
     - FCENetTargets:
         fourier_degree: 5
     - NormalizeImage:
         scale: 1./255.
         mean: [0.485, 0.456, 0.406]
         std: [0.229, 0.224, 0.225]
         order: 'hwc'
     - ToCHWImage:
     - KeepKeys:
         keep_keys: ['image', 'p3_maps', 'p4_maps', 'p5_maps'] # dataloader will return list in this order
 loader:
   shuffle: False
   drop_last: False
   batch_size_per_card: 1
   num_workers: 1

Eval:
 dataset:
   name: SimpleDataSet
   data_dir: /data/justcodew/ocr_data/icdar_convert/icdar_2019_art/ 
   label_file_list:
       - /data/justcodew/ocr_data/icdar_convert/icdar_2019_art/ppocr_test_label.txt
   transforms:
     - DecodeImage: # load image
         img_mode: BGR
         channel_first: False
         ignore_orientation: True
     - DetLabelEncode: # Class handling label
     - DetResizeForTest:
         limit_type: 'min'
         limit_side_len: 736
     - NormalizeImage:
         scale: 1./255.
         mean: [0.485, 0.456, 0.406]
         std: [0.229, 0.224, 0.225]
         order: 'hwc'
     - Pad: 
     - ToCHWImage:
     - KeepKeys:
         keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
 loader:
   shuffle: False
   drop_last: False
   batch_size_per_card: 1 # must be 1
   num_workers: 2


复现了,是一些边界问题,在这里修复了#6693

此外,这张图在使用dbnet时,shrink_map和threshold_map如下,shrink_map有点不正常,第一个弯曲文本左右部分效果差别比较大
shrink_map
threshold_map

复现了,是一些边界问题,在这里修复了#6693

具体是修改了哪里的代码。
另外,运行代码的随机性是哪里导致的, 已经将参数改为 shuffle: False drop_last: False batch_size_per_card: 1和seed =None
还是会有时正常 ,有时报错

看到了
add boader judge

FCENET数据增广部分有点问题
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/ppocr/data/imaug/fce_aug.py#L346

【1】
RandomCropPolyInstances中的方法可能会造成 results['polys'] 为空
需要加一下判断

            if len(valid_masks_list) > 0 :
                results['polys'] = np.array(valid_masks_list)
                results['ignore_tags'] = valid_tags_list

【2】代码中有几处 check_argument 并未注释,该函数并不存在 (mmocr中的)

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/ppocr/data/imaug/fce_targets.py#L500

当generate_fourier_maps中的ploy 全部都是一个点时,cal_fourier_signature会出错

比如令

            poly = np.array([[59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094]])

            mask = np.zeros((h, w), dtype=np.uint8)
            polygon = np.array(poly).reshape((1, -1, 2))
            cv2.fillPoly(mask, polygon.astype(np.int32), 1)
            fourier_coeff = self.cal_fourier_signature(polygon[0], k)

另外你们这里代码的实现与mmocr有些不同
https://github.com/open-mmlab/mmocr/blob/b8f7ead74cb0200ad5c422e82724ca6b2eb1c543/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py#L243
为什么你们的实现是去掉了 text_instance

             text_instance = [[poly[0][i], poly[0][i + 1]]
                             for i in range(0, len(poly[0]), 2)]
            mask = np.zeros((h, w), dtype=np.uint8)
            polygon = np.array(text_instance).reshape((1, -1, 2))
            cv2.fillPoly(mask, polygon.astype(np.int32), 1)
            fourier_coeff = self.cal_fourier_signature(polygon[0], k)