训练问题求助

Question

训练问题求助

Hyper-Devil opened this issue 2 years ago · comments

❔Question

自定义数据集训练失败

Additional context

python train.py --data data/cone.yaml --cfg models/yolov5n.yaml --weights '' --hyp data/hyps/hyp.scratch-low.yaml
wandb: Currently logged in as: whd. Use wandb login --relogin to force relogin
train: weights=, cfg=models/yolov5n.yaml, data=data/cone.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=300, batch_size=32, imgsz=416, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0,1, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, swin_float=False, aux_ota_loss=False
github: skipping check (Docker image), for updates see https://github.com/positive666/yolov5
/bin/sh: 1: git: not found
YOLOv5_research_plus 🚀 2022-8-23 Python-3.9.12 torch-1.8.1+cu111 CUDA:0 (GeForce RTX 3080, 10015MiB)
CUDA:1 (GeForce RTX 3080, 10018MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
wandb: Tracking run with wandb version 0.13.2
wandb: Run data is saved locally in /yolov5_research/wandb/run-20220823_222739-3jw78jcj
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run chocolate-field-6
wandb: ⭐ View project at https://wandb.ai/whd/YOLOv5
wandb: 🚀 View run at https://wandb.ai/whd/YOLOv5/runs/3jw78jcj
YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected.
Overriding model.yaml nc=80 with nc=3

             from  n    params  module                                  arguments

0 -1 1 1 -1 1 2 -1 1 3 -1 1 4 -1 2 5 -1 1 6 7 8 9 10 11 -1 1 12 [-1, 6] 1 13 14 -1 1 15 -1 1 16 [-1, 4] 1 17 18 19 [-1, 14] 1 20 21 22 [-1, 10] 1 23 24 [17, 20, 23] 1 initialize_biases done
YOLOv5n summary: 1760 models.common.Conv [3, 16, 6, 2, 2]
4672 models.common.Conv [16, 32, 3, 2]
4800 models.common.C3 [32, 32, 1]
18560 models.common.Conv [32, 64, 3, 2]
29184 models.common.C3 [64, 64, 2]
73984 models.common.Conv [64, 128, 3, 2]
-1 3 156928 models.common.C3 [128, 128, 3]
-1 1 295424 models.common.Conv [128, 256, 3, 2]
-1 1 296448 models.common.C3 [256, 256, 1]
-1 1 164608 models.common.SPPF [256, 256, 5]
-1 1 33024 models.common.Conv [256, 128, 1, 1]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 models.common.Concat [1]
-1 1 90880 models.common.C3 [256, 128, 1, False]
8320 models.common.Conv [128, 64, 1, 1]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 models.common.Concat [1]
-1 1 22912 models.common.C3 [128, 64, 1, False]
-1 1 36992 models.common.Conv [64, 64, 3, 2]
0 models.common.Concat [1]
-1 1 74496 models.common.C3 [128, 128, 1, False]
-1 1 147712 models.common.Conv [128, 128, 3, 2]
0 models.common.Concat [1]
-1 1 296448 models.common.C3 [256, 256, 1, False]
10824 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
270 layers, 1767976 parameters, 1767976 gradients, 4.2 GFLOPs

AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 57 weight (decay=0.0), 60 weight(decay=0.0005), 60 bias
WARNING: DP not recommended, use torch.distributed.run for best DDP Multi-GPU results.
See Multi-GPU Tutorial at ultralytics/yolov5#475 to get started.
train: Scanning '/cone_dataset/labels/train.cache' images and labels... 9512 fou
val: Scanning '/cone_dataset/labels/val.cache' images and labels... 1166 found,
Plotting labels to runs/train/exp6/labels.jpg...

AutoAnchor: 2.48 anchors/target, 0.929 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve...
AutoAnchor: WARNING: Extremely small objects found: 8033 of 84054 labels are < 3 pixels in size
AutoAnchor: Running kmeans for 9 anchors on 83607 points...
AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.8556: 100%|████
AutoAnchor: thr=0.25: 0.9999 best possible recall, 6.72 anchors past thr
AutoAnchor: n=9, img_size=416, metric_all=0.454/0.854-mean/best, past_thr=0.551-mean: 3,4, 4,5, 6,7, 8,9, 11,13, 14,17, 19,23, 25,30, 37,36
AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future)
Image sizes 416 train, 416 val
Using 8 dataloader workers
Logging results to runs/train/exp6
Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size

0%| | 0/298 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/yolov5_research/train.py", line 652, in
main(opt)
File "/yolov5_research/train.py", line 551, in main
train(opt.hyp, opt, device, callbacks)
File "/yolov5_research/train.py", line 294, in train
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/yolov5_research/utils/dataloaders.py", line 158, in iter
yield next(self.iterator)
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/yolov5_research/utils/dataloaders.py", line 623, in getitem
if random.random() < hyp['paste_in']:
KeyError: 'paste_in'

wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:
wandb: Synced chocolate-field-6: https://wandb.ai/whd/YOLOv5/runs/3jw78jcj
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20220823_222739-3jw78jcj/logs

Cheng Rui · Answer 1 · Tue Aug 30 2022 09:04:07 GMT+0800 (China Standard Time)

KeyError: 'paste_in' ，在你的超参YAML里加入这个就行

Cheng Rui · Answer 2 · Thu Sep 01 2022 16:50:02 GMT+0800 (China Standard Time)

❔Question

自定义数据集训练失败

Additional context

python train.py --data data/cone.yaml --cfg models/yolov5n.yaml --weights '' --hyp data/hyps/hyp.scratch-low.yaml wandb: Currently logged in as: whd. Use wandb login --relogin to force relogin train: weights=, cfg=models/yolov5n.yaml, data=data/cone.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=300, batch_size=32, imgsz=416, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0,1, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, swin_float=False, aux_ota_loss=False github: skipping check (Docker image), for updates see https://github.com/positive666/yolov5 /bin/sh: 1: git: not found YOLOv5_research_plus 🚀 2022-8-23 Python-3.9.12 torch-1.8.1+cu111 CUDA:0 (GeForce RTX 3080, 10015MiB) CUDA:1 (GeForce RTX 3080, 10018MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ wandb: Tracking run with wandb version 0.13.2 wandb: Run data is saved locally in /yolov5_research/wandb/run-20220823_222739-3jw78jcj wandb: Run wandb offline to turn off syncing. wandb: Syncing run chocolate-field-6 wandb: ⭐ View project at https://wandb.ai/whd/YOLOv5 wandb: 🚀 View run at https://wandb.ai/whd/YOLOv5/runs/3jw78jcj YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected. Overriding model.yaml nc=80 with nc=3
             from  n    params  module                                  arguments                     
0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2] 1 -1 1 4672 models.common.Conv [16, 32, 3, 2] 2 -1 1 4800 models.common.C3 [32, 32, 1] 3 -1 1 18560 models.common.Conv [32, 64, 3, 2] 4 -1 2 29184 models.common.C3 [64, 64, 2] 5 -1 1 73984 models.common.Conv [64, 128, 3, 2] 6 -1 3 156928 models.common.C3 [128, 128, 3] 7 -1 1 295424 models.common.Conv [128, 256, 3, 2] 8 -1 1 296448 models.common.C3 [256, 256, 1] 9 -1 1 164608 models.common.SPPF [256, 256, 5] 10 -1 1 33024 models.common.Conv [256, 128, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 90880 models.common.C3 [256, 128, 1, False] 14 -1 1 8320 models.common.Conv [128, 64, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 22912 models.common.C3 [128, 64, 1, False] 18 -1 1 36992 models.common.Conv [64, 64, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 74496 models.common.C3 [128, 128, 1, False] 21 -1 1 147712 models.common.Conv [128, 128, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 296448 models.common.C3 [256, 256, 1, False] 24 [17, 20, 23] 1 10824 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]] initialize_biases done YOLOv5n summary: 270 layers, 1767976 parameters, 1767976 gradients, 4.2 GFLOPs

AMP: checks passed ✅ optimizer: SGD(lr=0.01) with parameter groups 57 weight (decay=0.0), 60 weight(decay=0.0005), 60 bias WARNING: DP not recommended, use torch.distributed.run for best DDP Multi-GPU results. See Multi-GPU Tutorial at ultralytics/yolov5#475 to get started. train: Scanning '/cone_dataset/labels/train.cache' images and labels... 9512 fou val: Scanning '/cone_dataset/labels/val.cache' images and labels... 1166 found, Plotting labels to runs/train/exp6/labels.jpg...

AutoAnchor: 2.48 anchors/target, 0.929 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve... AutoAnchor: WARNING: Extremely small objects found: 8033 of 84054 labels are < 3 pixels in size AutoAnchor: Running kmeans for 9 anchors on 83607 points... AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.8556: 100%|████ AutoAnchor: thr=0.25: 0.9999 best possible recall, 6.72 anchors past thr AutoAnchor: n=9, img_size=416, metric_all=0.454/0.854-mean/best, past_thr=0.551-mean: 3,4, 4,5, 6,7, 8,9, 11,13, 14,17, 19,23, 25,30, 37,36 AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future) Image sizes 416 train, 416 val Using 8 dataloader workers Logging results to runs/train/exp6 Starting training for 300 epochs...
 Epoch   gpu_mem       box       obj       cls    labels  img_size
0%| | 0/298 [00:00<?, ?it/s] Traceback (most recent call last): File "/yolov5_research/train.py", line 652, in main(opt) File "/yolov5_research/train.py", line 551, in main train(opt.hyp, opt, device, callbacks) File "/yolov5_research/train.py", line 294, in train for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/yolov5_research/utils/dataloaders.py", line 158, in iter yield next(self.iterator) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/yolov5_research/utils/dataloaders.py", line 623, in getitem if random.random() < hyp['paste_in']: KeyError: 'paste_in'

wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb: wandb: Synced chocolate-field-6: https://wandb.ai/whd/YOLOv5/runs/3jw78jcj wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20220823_222739-3jw78jcj/logs

明天可以从新拉一下代码

Hyper-Devil · Answer 3 · Thu Sep 01 2022 16:51:32 GMT+0800 (China Standard Time)

收到，感谢您的工作