训练异常终止，请重新开始训练

Question

训练异常终止，请重新开始训练

Fyee opened this issue 9 months ago · comments

Atom commented 9 months ago

Checklist:

查找历史相关issue寻求解答
翻阅FAQ常见问题汇总和答疑
确认bug是否在新版本里还未修复
如果bug是由PaddleX API 2.0导致，且该bug在develop分支里已修复，参考FAQ Q4替换内置PaddleX API

描述问题

数据校验成功，但是执行模型训练时报错

复现

请提供您出现的报错信息及相关log（log的查找见 FAQ Q2）
Signal handlers are set for stagelog cleanup.
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery1 = gr.Gallery(
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery2 = gr.Gallery(
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery3 = gr.Gallery(
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
Running on local URL: http://127.0.0.1:63666
To create a public link, set share=True in launch().
Running on local URL: http://127.0.0.1:55236
To create a public link, set share=True in launch().
Signal handlers are set for stagelog cleanup.
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery1 = gr.Gallery(
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery2 = gr.Gallery(
D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
self.gallery3 = gr.Gallery(
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[]
Running on local URL: http://127.0.0.1:55237
To create a public link, set share=True in launch().
click dataset_varify_btn, start checking dataset, config: model name: picodet_layout_1x, dataset type: COCODetDataset,dataset path: data/example_data/det_layout_examples, max_show_cv: 10
Signal handlers are set for stagelog cleanup.
数据集校验成功
执行: "D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe" "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py" --exec_train
Signal handlers are set for stagelog cleanup.
['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output']
Log path: D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output\train.log
Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
[09/14 16:00:57] ppdet.data.source.coco INFO: Load [90 samples valid, 0 samples invalid] in file D:\5.Software\PaddleX DeskTop\workdir\2293451\1\data\example_data\det_layout_examples\annotations/instance_train.json.
W0914 16:00:57.090493 13520 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2
W0914 16:00:57.104496 13520 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4.
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls0.bias is unmatched with the shape [43] in model head.head_cls0.bias. And the weight head.head_cls0.bias will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls0.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls0.weight. And the weight head.head_cls0.weight will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls1.bias is unmatched with the shape [43] in model head.head_cls1.bias. And the weight head.head_cls1.bias will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls1.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls1.weight. And the weight head.head_cls1.weight will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls2.bias is unmatched with the shape [43] in model head.head_cls2.bias. And the weight head.head_cls2.bias will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls2.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls2.weight. And the weight head.head_cls2.weight will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls3.bias is unmatched with the shape [43] in model head.head_cls3.bias. And the weight head.head_cls3.bias will not be loaded
[09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls3.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls3.weight. And the weight head.head_cls3.weight will not be loaded
[09/14 16:00:59] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\11924/.cache/paddle/weights\picodet_lcnet_x1_0_fgd_layout.pdparams
Traceback (most recent call last):
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 209, in
main()
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 205, in main
run(FLAGS, cfg)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 158, in run
trainer.train(FLAGS.eval)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\engine\trainer.py", line 580, in train
outputs = model(data)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 60, in forward
out = self.get_loss()
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 82, in get_loss
head_outs, _ = self._forward()
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 66, in _forward
fpn_feats = self.neck(body_feats)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 331, in forward
inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx](
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 213, in forward
x_main = self.main_conv(x)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 53, in forward
x = self.bn(self.conv(x))
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\norm.py", line 781, in forward
return batch_norm(
File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\functional\norm.py", line 199, in batch_norm
batch_norm_out, _, _, _, _, _ = _C_ops.batch_norm(
MemoryError:

C++ Traceback (most recent call last):

Not support stack backtrace yet.

Error Message Summary:

ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 11.132812MB memory on GPU 0, 7.999512GB memory has been allocated and available memory is only 0.000000B.
Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
(at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86)
Traceback (most recent call last):
File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py", line 55, in
runner.run()
File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 402, in run
self.run_train()
File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 233, in run_train
self.uapi_model.train(
File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 80, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 82, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 93, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train
File "uapi\cv_uapi\paddledet_uapi\det\runner.py", line 29, in uapi.cv_uapi.paddledet_uapi.det.runner.DetRunner.train
File "uapi\base\runner.py", line 343, in uapi.base.runner.BaseRunner.run_cmd
uapi.base.utils.errors.CalledProcessError: Command ['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output'] returned non-zero exit status 1.
训练异常终止，请重新开始训练
Use via API
·
Built with Gradio
请提供您使用的GUI版本号
飞桨AI套件
当前版本号：2.1.0
请提供您使用的操作系统信息，如Linux/Windows/MacOS
Windows 11
请问您使用的CUDA/cuDNN的版本号是？
cuda 11.7
cuDNN 8.4.1
显卡GTX 3060TI

Atom commented 9 months ago

已解决