Test-Time Augmentation (TTA) Tutorial
glenn-jocher opened this issue Β· comments
π This guide explains how to use Test Time Augmentation (TTA) during testing and inference for improved mAP and Recall with YOLOv5 π. UPDATED 25 September 2022.
Before You Start
Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Test Normally
Before trying TTA we want to establish a baseline performance to compare to. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt
is the largest and most accurate model available. Other options are yolov5s.pt
, yolov5m.pt
and yolov5l.pt
, or you own checkpoint from training a custom dataset ./weights/best.pt
. For details on all available models please see our README table.
$ python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half
Output:
val: data=./data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 π v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
Model Summary: 476 layers, 87730285 parameters, 0 gradients
val: Scanning '../datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:01<00:00, 2846.03it/s]
val: New cache created: ../datasets/coco/val2017.cache
Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [02:30<00:00, 1.05it/s]
all 5000 36335 0.746 0.626 0.68 0.49
Speed: 0.1ms pre-process, 22.4ms inference, 1.4ms NMS per image at shape (32, 3, 640, 640) # <--- baseline speed
Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.504 # <--- baseline mAP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.688
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.546
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.351
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.382
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.681 # <--- baseline mAR
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.524
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.735
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826
Test with TTA
Append --augment
to any existing val.py
command to enable TTA, and increase the image size by about 30% for improved results. Note that inference with TTA enabled will typically take about 2-3X the time of normal inference as the images are being left-right flipped and processed at 3 different resolutions, with the outputs merged before NMS. Part of the speed decrease is simply due to larger image sizes (832 vs 640), while part is due to the actual TTA operations.
$ python val.py --weights yolov5x.pt --data coco.yaml --img 832 --augment --half
Output:
val: data=./data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=832, conf_thres=0.001, iou_thres=0.6, task=val, device=, single_cls=False, augment=True, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 π v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 476 layers, 87730285 parameters, 0 gradients
val: Scanning '../datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:01<00:00, 2885.61it/s]
val: New cache created: ../datasets/coco/val2017.cache
Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [07:29<00:00, 2.86s/it]
all 5000 36335 0.718 0.656 0.695 0.503
Speed: 0.2ms pre-process, 80.6ms inference, 2.7ms NMS per image at shape (32, 3, 832, 832) # <--- TTA speed
Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.516 # <--- TTA mAP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.701
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.562
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.564
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.656
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.388
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.640
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.696 # <--- TTA mAR
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.553
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.744
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.833
Inference with TTA
detect.py
TTA inference operates identically to val.py
TTA: simply append --augment
to any existing detect.py
command:
$ python detect.py --weights yolov5s.pt --img 832 --source data/images --augment
Output:
detect: weights=['yolov5s.pt'], source=data/images, imgsz=832, conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=True, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 π v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Downloading https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt to yolov5s.pt...
100% 14.1M/14.1M [00:00<00:00, 81.9MB/s]
Fusing layers...
Model Summary: 224 layers, 7266973 parameters, 0 gradients
image 1/2 /content/yolov5/data/images/bus.jpg: 832x640 4 persons, 1 bus, 1 fire hydrant, Done. (0.029s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 480x832 3 persons, 3 ties, Done. (0.024s)
Results saved to runs/detect/exp
Done. (0.156s)
PyTorch Hub TTA
TTA is automatically integrated into all YOLOv5 PyTorch Hub models, and can be accessed by passing augment=True
at inference time.
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s') # or yolov5m, yolov5x, custom
# Images
img = 'https://ultralytics.com/images/zidane.jpg' # or file, PIL, OpenCV, numpy, multiple
# Inference
results = model(img, augment=True) # <--- TTA inference
# Results
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
Customize
You can customize the TTA ops applied in the YOLOv5 forward_augment()
method here:
Lines 125 to 137 in 8c6f9e1
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
I have a question about the value under column P. Is it map@.60 ? (the default IOU threshold value is .60 at test.py)
I have a question about the value under column P. Is it map@.60 ? (the default IOU threshold value is .60 at test.py)
The default IOU threshold value is NMS threshold, not the map
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, what are those 3 different resolutions that TTA uses? Are they selected randomly? Many thanks
@gizemtanriver they are reduced multiples of the input resolution, they are not fixed values.
@glenn-jocher Thanks! Also, where exactly is this happening in the code? I couldn't find it.
Line 103 in df0e408
hi everyone. i retrained yolov5 with my custom dataset (2 class)
After that, i test my model with input image size (640, 640). i run code with my last.pt model. my output has shape torch.Size([1, 25200, 7])
But the problem is when i run same test with last.onnx. output shape is difference. (3, 80, 80, 7)
is that bug? Can you guy help me to solve this problem?
Hi There,
I tried the test.py
with TTA and enlarged the image size to 800 (trained at 640). I checked the saved information which do not look correct. The bounding boxes of label
are all wrong while the pred
images shows the wrong predicted bounding boxes too. When setting img-size back to 640, everything is correct again.
Namespace(augment=True, batch_size=32, conf_thres=0.001, data='dataset/custom.yaml', device='', exist_ok=False, img_size=800, iou_thres=0.6, name='exp', project='runs/test', save_conf=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=False, weights=['runs/train/exp/weights/best.pt'])
Using torch 1.7.0+cu110 CUDA:0 (GeForce RTX 3090, 24265MB)
Fusing layers...
Model Summary: 316 layers, 21488835 parameters, 0 gradients
Scanning 'dataset/eval/labels/A.cache' for images and labels... 3184 found, 0 missing, 0 empty, 0 corrupted: 100%|βββββββββββββββββ| 3184/3184 [00:00<?, ?it/s]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββ| 106/106 [00:06<00:00, 18.02it/s]
all 3.39e+03 1.59e+04 0.965 0.998 0.995 0.993
Speed: 1.4/0.08/1.5 ms inference/NMS/total per 800x800 image at batch-size 32
In addition, the listed folders or images in the test
option of the data file custom.yaml
seems not be used (from the console output, it said using eval
instead). My dataset is separated in with 3 folders, train, eval and test. e.g.
# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: [dataset/train/images/A,
dataset/train/images/B]
val: [dataset/eval/images/A,
dataset/eval/images/B]
test: [dataset/test/imagesC]
P.S. The best.pt is a yolov5m model.
@ihmc3jn09hk please raise a bug report using the bug report template if you believe there is a reproducible issue, thank you.
@ihmc3jn09hk please raise a bug report using the bug report template if you believe there is a reproducible issue, thank you.
Thank you for your reply. I don't know its related to the parameters I modified, e.g. hyp.scratch.yaml. Will do some more tests.
hi everyone. i retrained yolov5 with my custom dataset (2 class)
After that, i test my model with input image size (640, 640). i run code with my last.pt model. my output has shape torch.Size([1, 25200, 7])
But the problem is when i run same test with last.onnx. output shape is difference. (3, 80, 80, 7)
is that bug? Can you guy help me to solve this problem?
Hi @HoangTienDuc , i think that you are reading wrong output of the forward. This line below maybe helpful :
pred = net.forward(ln)[3]
Hi Glenn,
Why 832? From what I've noticed, you can make the img-size during augmented inference even larger for more accurate results. (In my case, I used img-size = 1664 during inference. The model is version yolo5x and was trained on images at img-size = 640.) Is that correct, or am I overseeing something here? Thanks!
@Shaotran yes you can pass any --img that you want. Results will vary by dataset, among other things. For large image inference you may want to use a P6 model, like yolov5l6.pt or yolov5x6.pt.
Hi, how to use TTA with torch.hub.load(...) ?
@marcusdiy TTA is automatically integrated into all YOLOv5 PyTorch Hub models, and can be accessed by passing augment=True
at inference time. I will update TTA tutorial to make a note of this use case.
results = model(imgs) # inference
results = model(imgs, augment=True) # TTA inference
@glenn-jocher
I would like to ask what 832
in this python val.py --weights yolov5x.pt --data coco.yaml --img 832 --augment --half
is set according to?
@Zengyf-CVer empirical results showed this was the largest that would produce improvements.
Append --augment to any existing val.py command to enable TTA, and increase the image size by about 30% for improved results.
#10312 pls check my pr - I have fixed this problem
does it work with segmentation?