labels require 56 columns each

Question

labels require 56 columns each

akatendra opened this issue a year ago · comments

tendra commented a year ago

❔Question

Hi!

I try to train model for detecting keypoints for one class with 9 keypoints.

I have an errors like:

train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/102_jpg.rf.0f0bf4b6ec94f8a7be6527458b7922f3.jpg: labels require 56 columns each

It feels like the model is still trying to find data for 17 points (56 columns each) of human pose while I only have 9 points (9*3 + 5 = 32 columns)

Please, help to solve problem!

Additional context

I try to use: https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose

I made COLAB(with GPU) with code:

from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

!# Download TexasInstruments | edgeai-yolov5 | YOLO-Pose Multi-person Pose estimation model code
!git clone https://github.com/TexasInstruments/edgeai-yolov5.git -b yolo-pose
%cd edgeai-yolov5
%pip install -r requirements.txt # install

import sys
import torch
print(f"Python version: {sys.version}, {sys.version_info} ")
print(f"Pytorch version: {torch.__version__} ")

import os
key_value = 'OMP_NUM_THREADS'
try:
  if os.environ[key_value]:
    print(f'The value of {key_value} is {os.environ[key_value]}')

except KeyError:
  print(f'{key_value} environment variable is not set.')

os.environ.setdefault(key_value, '8')

Start training:

# Remove train.cache from previous training
# !rm -rf <folder_name>
!rm -rf /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/train.cache

!rm -rf /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache

data_location = '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml'
cfg_location = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml'
weights = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt'
my_project = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg'
hyper_parameters = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml'


!python train.py --data {data_location} --cfg {cfg_location} --weights {weights} --epochs 100 --batch-size 64 --img 640 --kpt-label --project {my_project} --name edgeai-yolov5 --hyp {hyper_parameters}

And I get an error:

github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=64, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=100, evolve=False, exist_ok=False, global_rank=-1, hyp='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov56', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=64, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
[3, 5, 7]
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1   2681996  models.yolo.Detect                      [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients

Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
Scanning images:   0% 0/102 [00:00<?, ?it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/100_jpg.rf.4f0ac837f2ad41c10f5c40bd2aceb2d1.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/101_jpg.rf.342c555c0c142ee704a47a7eef5b3e24.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/102_jpg.rf.0f0bf4b6ec94f8a7be6527458b7922f3.jpg: labels require 56 columns each

<...>

train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 29 found, 0 missing, 0 empty, 29 corrupted:  28% 29/102 [00:00<00:00, 287.07it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/128_jpg.rf.ee990fa083f2e1fd001a05e52d24a651.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/12_jpg.rf.90b0d5548d1c6dc5ead449a15eb19b8f.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/130_jpg.rf.fb5e61ff1c164b031e8993ce832e94f7.jpg: labels require 56 columns each

<...>

train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 77 found, 0 missing, 0 empty, 77 corrupted:  75% 77/102 [00:00<00:00, 399.34it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/34_jpg.rf.0b643f03f0ebe6be6fc8bafa7bade034.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/35_jpg.rf.e0b7a971afca6a03921a5c694b9babae.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/36_jpg.rf.f3b0c8f3932a26483534c53f5c1bc5af.jpg: labels require 56 columns each

<...>

train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/55_jpg.rf.e3a9328b563b4f7408dabc21c6b31e9d.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/56_jpg.rf.daeff470200b3da62d2b36c5d4b2bbc3.jpg: labels require 56 columns each
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 102 corrupted: 100% 102/102 [00:00<00:00, 393.97it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
Traceback (most recent call last):
  File "train.py", line 550, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 189, in train
    dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
  File "/content/edgeai-yolov5/utils/datasets.py", line 63, in create_dataloader
    dataset = LoadImagesAndLabels(path, imgsz, batch_size,
  File "/content/edgeai-yolov5/utils/datasets.py", line 414, in __init__
    labels, shapes, self.segments = zip(*cache.values())
ValueError: not enough values to unpack (expected 3, got 0)

tendra · Answer 1 · Tue Feb 07 2023 03:39:29 GMT+0800 (China Standard Time)

I think I solve problems with error labels require 56 columns each by modifing utils/datasets.py

Everything about the 17 key points has been changed.

But I have an error again. Now It is about torch.cuda.OutOfMemoryError: CUDA out of memory. This is strange, because somehow there was enough memory for 17 points, why is there not enough memory for 9 points?

github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=64, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov523', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=64, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
[3, 5, 7]
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1   2681996  models.yolo.Detect                      [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients

Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 0 corrupted: 100% 102/102 [00:00<00:00, 292.12it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
val: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/valid.cache' images and labels... 29 found, 0 missing, 0 empty, 0 corrupted: 100% 1/1 [00:00<?, ?it/s]
Plotting labels... 

autoanchor: Analyzing anchors... anchors/target = 6.41, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov523
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls       kpt      kptv     total    labels  img_size
  0% 0/2 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 550, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 305, in train
    pred = model(imgs)  # forward
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/edgeai-yolov5/models/yolo.py", line 157, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/content/edgeai-yolov5/models/yolo.py", line 188, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/edgeai-yolov5/models/yolo.py", line 67, in forward
    x[i] = torch.cat((self.m[i](x[i]), self.m_kpt[i](x[i])), axis=1)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/edgeai-yolov5/models/common.py", line 45, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2450, in batch_norm
    return torch.batch_norm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 14.76 GiB total capacity; 13.41 GiB already allocated; 3.88 MiB free; 13.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

tendra · Answer 2 · Tue Feb 07 2023 16:26:21 GMT+0800 (China Standard Time)

If I try to reduce --batch-size from 64 into 32:

!python train.py --data {data_location} --cfg {cfg_location} --weights {weights} --batch-size 32 --img 640 --kpt-label --project {my_project} --name edgeai-yolov5 --epochs 3 --hyp {hyper_parameters}

I get an another error (same for --batch-size 32, --batch-size 15, --batch-size 1) >> RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2:

github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.9.0+cu102 CUDA:0 (Tesla T4, 15109.875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=32, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov528', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=32, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
[3, 5, 7]
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1   2681996  models.yolo.Detect                      [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients

Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 0 corrupted: 100% 102/102 [00:00<00:00, 342.49it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
val: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/valid.cache' images and labels... 29 found, 0 missing, 0 empty, 0 corrupted: 100% 1/1 [00:00<?, ?it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Plotting labels... 

autoanchor: Analyzing anchors... anchors/target = 6.41, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov528
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls       kpt      kptv     total    labels  img_size
  0% 0/4 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 550, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 306, in train
    loss, loss_items = compute_loss(pred, targets.to(device))  # loss scaled by batch_size
  File "/content/edgeai-yolov5/utils/loss.py", line 120, in __call__
    tcls, tbox, tkpt, indices, anchors = self.build_targets(p, targets)  # targets
  File "/content/edgeai-yolov5/utils/loss.py", line 207, in build_targets
    t = targets * gain
RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2

tendra · Answer 3 · Wed Feb 08 2023 16:45:38 GMT+0800 (China Standard Time)

If I try not set OMP_NUM_THREAD:

I get same error torch.cuda.OutOfMemoryError: CUDA out of memory.

If I try to reduce memory load by reducing of --batch-size 64 into --batch-size 32 or --batch-size 15 or --batch-size 1

I get an error:

RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2

bharal · Answer 4 · Thu Mar 02 2023 21:17:43 GMT+0800 (China Standard Time)

hi! I want to know that which function you use in utils/datasets.py to slove the error (labels require 56 columns each),thanks!

tendra · Answer 5 · Fri Mar 03 2023 01:53:46 GMT+0800 (China Standard Time)

Everything about the 17 key points has been changed in utils/datasets.py.

lidayu-01 · Answer 6 · Mon Dec 25 2023 15:58:29 GMT+0800 (China Standard Time)

Hello, I would like to know how you resolved this bug:
labels, shapes, self.segments = zip(*cache.values())
ValueError: not enough values to unpack (expected 3, got 0)
Why are all the training set images corrupted? thank you very much