PaddlePaddle / PaddleX

PaddlePaddle End-to-End Development Toolkit(飞桨低代码开发工具)

Home Page:https://paddlex.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

训练异常终止,请重新开始训练

Fyee opened this issue · comments

commented

Checklist:

  1. 查找历史相关issue寻求解答
  2. 翻阅FAQ常见问题汇总和答疑
  3. 确认bug是否在新版本里还未修复
  4. 如果bug是由PaddleX API 2.0导致,且该bug在develop分支里已修复,参考FAQ Q4替换内置PaddleX API

描述问题

数据校验成功,但是执行模型训练时报错

复现

  1. 请提供您出现的报错信息及相关log(log的查找见 FAQ Q2
    Signal handlers are set for stagelog cleanup.
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery1 = gr.Gallery(
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery2 = gr.Gallery(
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery3 = gr.Gallery(
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    Running on local URL: http://127.0.0.1:63666
    To create a public link, set share=True in launch().
    Running on local URL: http://127.0.0.1:55236
    To create a public link, set share=True in launch().
    Signal handlers are set for stagelog cleanup.
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery1 = gr.Gallery(
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery2 = gr.Gallery(
    D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
    self.gallery3 = gr.Gallery(
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
    Running on local URL: http://127.0.0.1:55237
    To create a public link, set share=True in launch().
    click dataset_varify_btn, start checking dataset, config: model name: picodet_layout_1x, dataset type: COCODetDataset,dataset path: data/example_data/det_layout_examples, max_show_cv: 10
    Signal handlers are set for stagelog cleanup.
    数据集校验成功
    执行: "D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe" "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py" --exec_train
    Signal handlers are set for stagelog cleanup.
    ['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output']
    Log path: D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output\train.log
    Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
    loading annotations into memory...
    Done (t=0.03s)
    creating index...
    index created!
    [09/14 16:00:57] ppdet.data.source.coco INFO: Load [90 samples valid, 0 samples invalid] in file D:\5.Software\PaddleX DeskTop\workdir\2293451\1\data\example_data\det_layout_examples\annotations/instance_train.json.
    W0914 16:00:57.090493 13520 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2
    W0914 16:00:57.104496 13520 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4.
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls0.bias is unmatched with the shape [43] in model head.head_cls0.bias. And the weight head.head_cls0.bias will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls0.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls0.weight. And the weight head.head_cls0.weight will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls1.bias is unmatched with the shape [43] in model head.head_cls1.bias. And the weight head.head_cls1.bias will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls1.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls1.weight. And the weight head.head_cls1.weight will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls2.bias is unmatched with the shape [43] in model head.head_cls2.bias. And the weight head.head_cls2.bias will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls2.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls2.weight. And the weight head.head_cls2.weight will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls3.bias is unmatched with the shape [43] in model head.head_cls3.bias. And the weight head.head_cls3.bias will not be loaded
    [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls3.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls3.weight. And the weight head.head_cls3.weight will not be loaded
    [09/14 16:00:59] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\11924/.cache/paddle/weights\picodet_lcnet_x1_0_fgd_layout.pdparams
    Traceback (most recent call last):
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 209, in
    main()
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 205, in main
    run(FLAGS, cfg)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 158, in run
    trainer.train(FLAGS.eval)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\engine\trainer.py", line 580, in train
    outputs = model(data)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 60, in forward
    out = self.get_loss()
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 82, in get_loss
    head_outs, _ = self._forward()
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 66, in _forward
    fpn_feats = self.neck(body_feats)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 331, in forward
    inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx](
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 213, in forward
    x_main = self.main_conv(x)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 53, in forward
    x = self.bn(self.conv(x))
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\norm.py", line 781, in forward
    return batch_norm(
    File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\functional\norm.py", line 199, in batch_norm
    batch_norm_out, _, _, _, _, _ = _C_ops.batch_norm(
    MemoryError:

C++ Traceback (most recent call last):

Not support stack backtrace yet.

Error Message Summary:

ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 11.132812MB memory on GPU 0, 7.999512GB memory has been allocated and available memory is only 0.000000B.
Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.

  2. If no, please decrease the batch size of your model.
    (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86)
    Traceback (most recent call last):
    File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py", line 55, in
    runner.run()
    File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 402, in run
    self.run_train()
    File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 233, in run_train
    self.uapi_model.train(
    File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 80, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
    File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 82, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
    File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 93, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
    File "uapi\cv_uapi\paddledet_uapi\det\runner.py", line 29, in uapi.cv_uapi.paddledet_uapi.det.runner.DetRunner.train
    File "uapi\base\runner.py", line 343, in uapi.base.runner.BaseRunner.run_cmd
    uapi.base.utils.errors.CalledProcessError: Command ['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output'] returned non-zero exit status 1.
    训练异常终止,请重新开始训练
    Use via API
    ·
    Built with Gradio

  3. 请提供您使用的GUI版本号
    飞桨AI套件
    当前版本号:2.1.0

  4. 请提供您使用的操作系统信息,如Linux/Windows/MacOS
    Windows 11

  5. 请问您使用的CUDA/cuDNN的版本号是?
    cuda 11.7
    cuDNN 8.4.1
    显卡GTX 3060TI

commented

已解决