[Bug]: Training of anomalib model on custom dataset is taking too long!

Question

[Bug]: Training of anomalib model on custom dataset is taking too long!

UTKARSH-VISCON opened this issue 3 months ago · comments

Describe the bug

I am trying to train a anomalib model on my custom dataset, but its taking too long to train (even after 3 days there were no results).

I am using the same code as provided in the anomalib docs:

from anomalib.data import Folder
from anomalib.models import Patchcore
from anomalib.engine import Engine

Create the datamodule

datamodule = Folder(
name="hazelnut_toy",
root="datasets/hazelnut_toy",
normal_dir="good",
abnormal_dir="crack",
task="classification",
)

Setup the datamodule

datamodule.setup()

Create the model and engine

model = Patchcore()
engine = Engine(task="classification")

Train a Patchcore model on the given datamodule

engine.train(datamodule=datamodule, model=model)

Output screen (Its just stuck at this):

┌───┬───────────────────────┬──────────
│ │ Name │ Type │ Params │ Mode │
├───┼───────────────────────┼───────────
│ 0 │ model │ PatchcoreModel │ 643 K │ train │
│ 1 │ _transform │ Compose │ 0 │ train │
│ 2 │ normalization_metrics │ MetricCollection │ 0 │ train │
│ 3 │ image_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 4 │ pixel_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 5 │ image_metrics │ AnomalibMetricCollection │ 0 │ train │
│ 6 │ pixel_metrics │ AnomalibMetricCollection │ 0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K
Non-trainable params: 0
Total params: 643 K
Total estimated model params size (MB): 2
Modules in train mode: 15
Modules in eval mode: 46

Dataset

Custom Dataset

Model

PatchCore

Steps to reproduce the behavior

Installed Anomalib
Use the anomalib repo from github
Run the training code on custom dataset.

OS information

OS information:

OS: [Windows 11]
Python version: [3.10.0]
Anomalib version: [1.1.0]
PyTorch version: [2.2.2]
CUDA/cuDNN version: [11.8]
GPU models and configuration: [NVIDIA GeForce RTX 3050 Ti]
Any other relevant information: [I'm using a custom dataset]

Expected behavior

The model should get trained

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

# Import the datamodule
from anomalib.data import Folder

# Create the datamodule
datamodule = Folder(
    name="hazelnut_toy",
    root="datasets/hazelnut_toy",
    normal_dir="good",
    abnormal_dir="crack",
    task="classification",
)

# Setup the datamodule
datamodule.setup()

Logs

┌───┬───────────────────────┬──────────
│   │ Name                  │ Type                     │ Params │ Mode  │
├───┼───────────────────────┼───────────
│ 0 │ model                 │ PatchcoreModel           │  643 K │ train │
│ 1 │ _transform            │ Compose                  │      0 │ train │
│ 2 │ normalization_metrics │ MetricCollection         │      0 │ train │
│ 3 │ image_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 4 │ pixel_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 5 │ image_metrics         │ AnomalibMetricCollection │      0 │ train │
│ 6 │ pixel_metrics         │ AnomalibMetricCollection │      0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K                                                        
Non-trainable params: 0                                                        
Total params: 643 K                                                            
Total estimated model params size (MB): 2                                      
Modules in train mode: 15                                                      
Modules in eval mode: 46

Code of Conduct

I agree to follow this project's Code of Conduct

Aimira Baitieva · Answer 1 · Sun Aug 25 2024 01:37:13 GMT+0800 (China Standard Time)

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

UTKARSH · Answer 2 · Tue Aug 27 2024 13:45:16 GMT+0800 (China Standard Time)

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

I have a total of 90 images in my dataset (900x900 resolution)

Aimira Baitieva · Answer 3 · Wed Sep 04 2024 01:25:53 GMT+0800 (China Standard Time)

Can you try if it works with 256x256? Maybe there is some different problem, especially if the output screen is stuck.

Samet Akcay · Answer 4 · Thu Sep 19 2024 00:04:17 GMT+0800 (China Standard Time)

@UTKARSH-VISCON, I don't think it is an Anomalib problem. Patchcore is computationally expensive, requiring too much memory, especially during the coreset sampling. As @abc-125 suggested, you could try to reduce the image size to see if it helps a bit.