ultralytics / yolov5

📚 This guide explains how to apply pruning to YOLOv5 🚀 models. UPDATED 25 September 2022.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Test Normally

Before pruning we want to establish a baseline performance to compare to. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

$ python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half

Output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:12<00:00,  2.16it/s]
                 all       5000      36335      0.732      0.628      0.683      0.496
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- base speed

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.507  # <--- base mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.689
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.552
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.345
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.630
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.682
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.829
Results saved to runs/val/exp

Test YOLOv5x on COCO (0.30 sparsity)

We repeat the above test with a pruned model by using the torch_utils.prune() command. We update val.py to prune YOLOv5x to 0.3 sparsity:

30% pruned output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
Pruning model...  0.3 global sparsity
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:11<00:00,  2.19it/s]
                 all       5000      36335      0.724      0.614      0.671      0.478
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- prune mAP

Evaluating pycocotools mAP... saving runs/val/exp3/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.489  # <--- prune mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.677
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.537
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.370
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.664
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.803
Results saved to runs/val/exp3

In the results we can observe that we have achieved a sparsity of 30% in our model after pruning, which means that 30% of the model's weight parameters in nn.Conv2d layers are equal to 0. Inference time is essentially unchanged, while the model's AP and AR scores a slightly reduced.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher why the speed doesn't change at all after prune? Is that only remove the weight of conv but not changed the structure actually? how to save the pruned model and it's architecture for retraining?

Is there a guideline on how much we should prune by? What are the benefits to doing this?

@jinfagang yes, structure is not changed at all, and parameter count is the same, it's just that some of the weights are 0 instead of near zero as they were before.

I suppose this would allow for effective kmeans quantization to lower bits (for smaller filesizes), but I'm not sure about any possible speed improvement. I think as long as the parameter count remains the same, the speed will remain the same.

@NanoCode012 no guidelines really, its just an experiment to see how many of the weights you can remove and what effect that has on performance. Honestly I don't really see any great applications at the moment based on my results above, but it's there in case anyone would like to explore it further.

@glenn-jocher Looka like prune has a remove method which can remove weights:

prune.remove(module, 'weight')

and all weights and params saved in module.state_dict which can be used for new pruned model.

@jinfagang yes, this .remove() method is deleting the original weights as there is a pruned copy also in the model. So before applying remove the model/module will have 2X the normal parameters, after using it it is back to it's normal parameter count.

You have to consider the shapes of the operations in the forward pass. For a convolution from say shape(1,128,20,20) to shape(1,256,20,20) you must have a weight matrix of shape 128x256. It's not possible to remove elements from a normal matrix or tensor, as it will always need 128*256 weights inside it.

There are special cases of sparse matrices in some packages/languages, it may be possible pytorch is converting the original tensor to a sparse tensor with the same shape, though I'm not sure if this is the case. Even if it were, any exported models (i.e. onnx, coreml, tensorrt) using these sparse matrices would need special support for them, or they would be handled as normal matrices.

The current pruning method incorporates the line of code you mention already as well:

yolov5/utils/torch_utils.py

Lines 88 to 97 in 121d90b

    
           def prune(model, amount=0.3): 
        
               # Prune model to requested global sparsity 
        
               import torch.nn.utils.prune as prune 
        
               print('Pruning model... ', end='') 
        
               for name, m in model.named_modules(): 
        
                   if isinstance(m, nn.Conv2d): 
        
                       prune.l1_unstructured(m, name='weight', amount=amount)  # prune 
        
                       prune.remove(m, 'weight')  # make permanent 
        
               print(' %.3g global sparsity' % sparsity(model))

@glenn-jocher Nice. do u figure out how to obtain the pruned model architecture?

@jinfagang well that's what I was saying, the architecture does not change. In my example above, the 128x256 convolution weights are still a 128x256 weights, it's just that some of their values that were previously near-zero have been set equal to zero during the pruning. The 128x256 matrix may or may not then be stored as a sparse matrix, which is a special type of matrix intended for use with data that contains mostly zeros, and saves memory (and maybe or maybe not also saves processing time).

TLDR the architecture is exactly the same when pruning, no layers are removed as far as I know, and the input and output shapes (and shapes of all intermediate layers) remain the same.

@glenn-jocher so the simplified model can not get it's new channel num and shape automatically, is there anyway to make it happen?

@glenn-jocher First feel your work! Let me ask you, which paper or project address is your pruning based on?

@Lornatang I based this pruning implementation off of the original pytorch pruning tutorial at the link below, but the idea to apply pruning here originally came from @jinfagang. I don't actually have any experience pruning models.
https://pytorch.org/tutorials/intermediate/pruning_tutorial.html

@jinfagang I modified detect.py to prune and save, and print updated model info:

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_normal.pt')

    torch_utils.prune(model, 0.3)
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_pruned.pt')

Output:

Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS
Pruning model...  0.299 global sparsity
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS

Model sizes are here (for both yolov5s in FP32):

So maybe layer pruning or channel-level sparsity works better since it changes the architecture of the network?
I have seen a project like this:
https://github.com/tanluren/yolov3-channel-and-layer-pruning

@HenryWang628 I see, thanks for the link. The tensorboard histograms are very nice. So it seems a more useful method would be channel prune, mAP drop > finetune x epochs, recover some lost mAP.

This all raises the question though, if you are going to go through all of this effort on a large model like YOLOv5x, why not just train a smaller model like YOLOv5s? The training time will be much faster, and you don't need the extra pruning and finetuning steps.

For anyone interested, there is a detailed discussion on this here pytorch/tutorials#1054 (comment)

The author there says this:

I'm not familiar with your architecture, so you'll have to decide which parameters it makes sense to pool together and compare via global magnitude-based pruning; but let's assume, just for the sake of this simple example, that you only want to consider the convolutional layers identified by the logic of my if-statement below [if those aren't the weights you care about, please feel free to modify that logic as you wish].

Now, those layers happen to come with two parameters: "weight" and "bias". Let's say you are interested in the weights [if you care about the biases too, feel free to add them in as well in the parameters_to_prune]. Alright, how do we tell global_unstructured to prune those weights in a global manner? We do so by constructing parameters_to_prune as requested by that function [again, see docs and tutorial linked above].
parameter_to_prune = [
    (v, "weight") 
    for k, v in dict(model.named_modules()).items()
    if ((len(list(v.children())) == 0) and (k.endswith('conv')))
]

# now you can use global_unstructured pruning
prune.global_unstructured(parameter_to_prune, pruning_method=prune.L1Unstructured, amount=0.3)
To check that that succeeded, you can now look at the global sparsity across those layers, which should be 30%, as well as the individual per-layer sparsity:
# global sparsity
nparams = 0
pruned = 0
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        nparams += v.weight.nelement()
        pruned += torch.sum(v.weight == 0)
print('Global sparsity across the pruned layers: {:.2f}%'.format( 100. * pruned / float(nparams)))
# ^^ should be 30%

# local sparsity
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        print(
            "Sparsity in {}: {:.2f}%".format(
                k,
                100. * float(torch.sum(v.weight == 0))
                / float(v.weight.nelement())
            )
        )
# ^^ will be different for each layer
Originally posted by @mickypaganini in pytorch/tutorials#1054 (comment)

More info from pytorch/tutorials#605 (comment)

Hi @cranmer,
Hopefully this tutorial will be included soon (cc: @soumith).

As is, this module is not intended (by itself) to help you with memory savings. All that pruning does is to replace some entries with zeroes. This itself doesn't buy you anything, unless you represent the sparse tensor in a smarter way (which this module itself doesn't handle for you). You can, however, rely on torch.sparse and other functionalities there to help you with that. To give you a concrete example:
import torch
import torch.nn.utils.prune as prune

t = torch.randn(100, 100)
torch.save(t, 'full.pth')

p = prune.L1Unstructured(amount=0.9)
pruned = p.prune(t)
torch.save(pruned, 'pruned.pth')

sparsified = pruned.to_sparse()
torch.save(sparsified, 'sparsified.pth')
When I ls, these are the sizes on disk:
21K sparsified.pth
40K pruned.pth
40K full.pth
By the way, before calling prune.remove, you can expect you memory footprint to be a lot higher than what you started out with, because for each pruned parameter you now have: the original parameter, the mask, and the pruned version of the tensor. Calling prune.remove brings you back to only having a single (now pruned) tensor per pruned parameter. Still, if you don't represent these pruned parameters smartly, the memory footprint at this point won't be any lower than what you started out with.

Originally posted by @mickypaganini in pytorch/tutorials#605 (comment)

@glenn-jocher I think you can refer to https://github.com/vainf/torch-pruning, he has implemented this function in detail.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Hi, thank you everyone for the informative comments. Thanks Glen for this super-cool library. Not sure if there is a way to implement a line like - "sparsified = pruned.to_sparse()" (pytorch/tutorials#605 (comment)) for nn.conv2d?

I am trying to reduce the overall model weights. Eventually, I want to port this to a Jetson Nano. My understanding is that a smaller model yields --> faster speeds. Please correct me if my understanding is wrong. Thanks.

@shoebNTU any speed benefits would depend on the capability of your hardware and drivers to exploit sparse matrices, so there is no single answer to your question.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

i tried to add torch_utils.prune(model, 0.3) to test.py and ran the command.

it gives me this error..

NameError: name 'torch_utils' is not defined

@joel5638 torch_utils.py is a file in the utils directory. You can import is by running the code below. I've updated the tutorial above also showing the import now.

from utils import torch_utils

I just proposed this change which allows for structured (kernel) pruning thus changing the network's architecture.
link

@glenn-jocher why the speed doesn't change at all after prune? Is that only remove the weight of conv but not changed the structure actually? how to save the pruned model and it's architecture for retraining?

i also wang to know,why time does not change. about the pruning,can you explain it deeply?

@lzh1998-lzh this particular pruning does not remove any layers, it only sets some values to zero.

 i do not konw

…

------------------ 原始邮件 ------------------ 发件人: "ultralytics/yolov5" ***@***.***>; 发送时间: 2021年12月31日(星期五) 下午2:34 ***@***.***>; ***@***.******@***.***>; 主题: Re: [ultralytics/yolov5] Pruning/Sparsity Tutorial (#304) it can do faster or batter? — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: ***@***.***>

Hi, I'm using Yolo v3 tiny to train a custom model. But I need to reduce inference time. Can I use the model pruning function for v3 model? Since it is in the v5 repo, I am not sure is it useful for my case.

@GulerEnes 👋 Hello! Thanks for asking about inference speed issues. Pruning is not recommended for speed improvements using this tutorial. YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

Use batched inference with YOLOv5 PyTorch Hub
Reduce --img-size, i.e. 1280 -> 640 -> 320
Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
Use half precision FP16 inference with python detect.py --half and python val.py --half
Use a faster GPUs, i.e.: P100 -> V100 -> A100
Export to ONNX or OpenVINO for up to 3x CPU speedup (CPU Benchmarks)
Export to TensorRT for up to 5x GPU speedup
Use a free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

@jinfagang yes, structure is not changed at all, and parameter count is the same, it's just that some of the weights are 0 instead of near zero as they were before.

I suppose this would allow for effective kmeans quantization to lower bits (for smaller filesizes), but I'm not sure about any possible speed improvement. I think as long as the parameter count remains the same, the speed will remain the same.

@NanoCode012 no guidelines really, its just an experiment to see how many of the weights you can remove and what effect that has on performance. Honestly I don't really see any great applications at the moment based on my results above, but it's there in case anyone would like to explore it further.

There are real application values unless further steps can be done on top of pruning that also reduces the model weights and speeds up inference, which seems to be the job of sparsify() and the specific hardware design for acceleration. @shoebNTU @jinfagang @glenn-jocher I liked the comments that why not start training from a smaller model directly. In some cases, we might just want to simplify the steps that just download a slightly larger and better model, and use a simple method like prune() or sparsify(), or just an argument --prune 0.3, --sparsify 0.3 so that we can run the model on edge devices directly. @glenn-jocher do you see some values to add this kind of argument instead of modifying the val.py script ourselves? I understand that the architecture and weights won't change in the memory, which is unstructured pruning (Update Y2022M06D22Wed)

@bryanbo-cao yes good comments. There may be smarter ways to implement pruning in trained models as this tutorial is a bit out of date by now. If you have any better methods using prune() or sparsify() please let us know as we have not been focusing efforts on pruning/sparsity recently.

@glenn-jocher maybe pruning is popular enough task worth providing a top level script/interface to perform pruning and save the resulting model. Based on this question #8450

@AyushExel yes pruning needs some more attention. The above tutorial is based on pruning individual weights, but this results in a sparse model that's no faster than the original as small weights are simply zeroed out. What we need is an updated strategy that prunes entire channels to actually reduce channel count and speed up inference.

I think also torch has updated pruning methods since I created this tutorial 2 years ago. We should add pruning to our project TODOs, but create methods that are task agnostic, i.e. can be applied to classification and segmentation also.

@glenn-jocher Thank you for your valuable comments.
I just want to ask if we do "structured" instead of "unstructured" pruning, can we speed up the inference speed and reduce the model parameters with for example removing some channels from Yolov5s?
Thank you.

@H-deep yes exactly! We need to implement structured pruning methods as the current implementation is unstructured pruning which does not allow us to improve inference speeds.

@Lornatang I based this pruning implementation off of the original pytorch pruning tutorial at the link below, but the idea to apply pruning here originally came from @jinfagang. I don't actually have any experience pruning models. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html

@jinfagang I modified detect.py to prune and save, and print updated model info:
    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_normal.pt')

    torch_utils.prune(model, 0.3)
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_pruned.pt')
Output:
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS
Pruning model...  0.299 global sparsity
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS
Model sizes are here (for both yolov5s in FP32):

how to save pruned model, thanks

@jobsjiang As mentioned in the pruning tutorial, in the naive way of pruning (unstructured pruning) the pruned model's architecture won't change. It does not actually remove them from the model. So you save the whole model and those "pruned weights" still take some space in the file, but the difference is that some parameters which originally have little contributions are set to be zeros in the saved file.

What you want probably is structured pruning that also removes those pruned weights from the network.

Do you have the code about how to save the pruned model？，thanks

There are some dependencies in some filters among layers so it needs to check these dependencies before pruning, which are specific to a neural network's architecture. Pruning a relatively complex model like YOLOv5 is non-trivial. There is one reference: https://github.com/VainF/Torch-Pruning.

@glenn-jocher
Hi,
Could I just ask you a question regarding why the pruning taught here is not sparsely trained?

I refer to the following projects:
https://github.com/midasklr/yolov5prune/tree/v5.0
https://github.com/midasklr/yolov5prune/tree/v6.0
https://github.com/tanluren/yolov3-channel-and-layer-pruning

As the sparse training epoch progresses, more and more gamma approaches 0 by looking at tensorboard bn.

After training, pruning can be performed. A basic principle is that the threshold cannot be greater than the maximum gamma of any channel bn. Then prune according to the percentage.

@jayer95 our tutorial is in need of updating! I wrote it myself a while ago. If you'd like to propose updates/fixes that would be awesome to help everyone :)

@glenn-jocher Sure, I got it :)

Hello, is it possible to retrain pruned model? We have trained yolov5 on our custom data, then pruned the model, and would like to retrain it on the same custom data. The naive attempt to perform normal training on the pruned model was not successful and the following error was caught:
model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create TypeError: 'DetectMultiBackend' object is not subscriptable

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:

File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable

Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:
File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable
Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

I have same problem. In yolov5, the pt file is a ckpt, not just the model part. My ugly solution is create a new ckpt, and copy all options except the model from the original ckpt to new new ckpt, and set the pruned model to the new ckpt.

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

After the model

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

Thank you very much!

@relaxtheo hi there,

Thanks for sharing your experience with model pruning in YOLOv5. While model pruning aims to reduce the computation required for inference by removing redundant and unnecessary parameters, the effectiveness of pruning may depend on various factors, including the specific model architecture and the amount of pruning applied. Therefore, it's possible that in your case, the pruning method you used didn't achieve significant improvements in speed or performance.

If you're looking to further optimize your model, you may want to consider other approaches such as quantization or knowledge distillation. These optimization techniques can help reduce the size and computation required for inference, resulting in faster and more efficient models.

Please let us know if you have any further questions or concerns. We're here to help!

Best, [Your name/Team name]

After the model

@relaxtheo hi,
The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.
One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.
Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.
I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

@relaxtheo I think the current pruning method is specifically "unstructured pruning" (correct me if I am wrong) where filters with small weight magnitudes are set to 0s, but they are still stored in the model weight file (i.e. <model>.pth) and those zero values are not actually removed which still take some space in the disk. That's why the model file size is not changed. During inference, unless the code has an explicit way to accelerate like skipping those zeros, it will still do the same amount of computation on those parameters with zero values. But the advantage is that I treat it as an efficient way to estimate how the model performance can preserve and the potential to accelerate, so that I know when to actually prune the model in the next step.

The thing you are looking for might be "structure pruning" (https://github.com/VainF/Torch-Pruning) that actually removes those zeros after pruning to save both space and time, but it is not easy to implement due to the dependency among layers in various network architectures.

@bryanbocao hi there,

Thank you for reaching out. You are correct that the current pruning method in YOLOv5 uses unstructured pruning, where filters with small weight magnitude are set to 0s, while they are still stored in the weight file. As a result, the model file size may not change significantly, and inference speed may not be improved unless the code has an explicit way to accelerate like skipping those zeros.

Structure pruning, on the other hand, removes those zeros after pruning to save both space and time. However, implementing structure pruning may not be easy due to the dependency among layers in various network architectures.

We appreciate your feedback on this issue, and we'll keep it in mind as we continue to improve YOLOv5. If you have any further questions or concerns, don't hesitate to let us know.

Best, [Your name/Team name]

@bryanbocao @glenn-jocher Thank you all very much, I will try your recommendations

@relaxtheo Thank you for reaching out, and we're glad to hear that our recommendations were helpful. Don't hesitate to let us know if you have any further questions or concerns. We're here to help!

It may be a bit unrelated but I am similar error while trying to do training. I am still new to the yolo models. Can you help me please with solving it?

@Mary14-design it seems like there might be an issue with the image link you've provided; it's not displaying correctly. However, I'm here to help you with your training issue. Could you please provide more details about the error message you're encountering during training with YOLOv5? This will help me understand the problem better and assist you accordingly. If you can copy and paste the error message or describe the issue in more detail, that would be great.

	def prune(model, amount=0.3):
	# Prune model to requested global sparsity
	import torch.nn.utils.prune as prune
	print('Pruning model... ', end='')
	for name, m in model.named_modules():
	if isinstance(m, nn.Conv2d):
	prune.l1_unstructured(m, name='weight', amount=amount) # prune
	prune.remove(m, 'weight') # make permanent
	print(' %.3g global sparsity' % sparsity(model))