openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.

Home Page:https://anomalib.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Training of anomalib model on custom dataset is taking too long!

UTKARSH-VISCON opened this issue · comments

Describe the bug

I am trying to train a anomalib model on my custom dataset, but its taking too long to train (even after 3 days there were no results).

I am using the same code as provided in the anomalib docs:

from anomalib.data import Folder
from anomalib.models import Patchcore
from anomalib.engine import Engine

Create the datamodule

datamodule = Folder(
name="hazelnut_toy",
root="datasets/hazelnut_toy",
normal_dir="good",
abnormal_dir="crack",
task="classification",
)

Setup the datamodule

datamodule.setup()

Create the model and engine

model = Patchcore()
engine = Engine(task="classification")

Train a Patchcore model on the given datamodule

engine.train(datamodule=datamodule, model=model)

Output screen (Its just stuck at this):

┌───┬───────────────────────┬──────────
│ │ Name │ Type │ Params │ Mode │
├───┼───────────────────────┼───────────
│ 0 │ model │ PatchcoreModel │ 643 K │ train │
│ 1 │ _transform │ Compose │ 0 │ train │
│ 2 │ normalization_metrics │ MetricCollection │ 0 │ train │
│ 3 │ image_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 4 │ pixel_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 5 │ image_metrics │ AnomalibMetricCollection │ 0 │ train │
│ 6 │ pixel_metrics │ AnomalibMetricCollection │ 0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K
Non-trainable params: 0
Total params: 643 K
Total estimated model params size (MB): 2
Modules in train mode: 15
Modules in eval mode: 46

Dataset

Custom Dataset

Model

PatchCore

Steps to reproduce the behavior

  1. Installed Anomalib
  2. Use the anomalib repo from github
  3. Run the training code on custom dataset.

OS information

OS information:

  • OS: [Windows 11]
  • Python version: [3.10.0]
  • Anomalib version: [1.1.0]
  • PyTorch version: [2.2.2]
  • CUDA/cuDNN version: [11.8]
  • GPU models and configuration: [NVIDIA GeForce RTX 3050 Ti]
  • Any other relevant information: [I'm using a custom dataset]

Expected behavior

The model should get trained

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

# Import the datamodule
from anomalib.data import Folder

# Create the datamodule
datamodule = Folder(
    name="hazelnut_toy",
    root="datasets/hazelnut_toy",
    normal_dir="good",
    abnormal_dir="crack",
    task="classification",
)

# Setup the datamodule
datamodule.setup()

Logs

┌───┬───────────────────────┬──────────
│   │ Name                  │ Type                     │ Params │ Mode  │
├───┼───────────────────────┼───────────
│ 0 │ model                 │ PatchcoreModel           │  643 K │ train │
│ 1 │ _transform            │ Compose                  │      0 │ train │
│ 2 │ normalization_metrics │ MetricCollection         │      0 │ train │
│ 3 │ image_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 4 │ pixel_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 5 │ image_metrics         │ AnomalibMetricCollection │      0 │ train │
│ 6 │ pixel_metrics         │ AnomalibMetricCollection │      0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K                                                        
Non-trainable params: 0                                                        
Total params: 643 K                                                            
Total estimated model params size (MB): 2                                      
Modules in train mode: 15                                                      
Modules in eval mode: 46

Code of Conduct

  • I agree to follow this project's Code of Conduct

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

I have a total of 90 images in my dataset (900x900 resolution)

Can you try if it works with 256x256? Maybe there is some different problem, especially if the output screen is stuck.

@UTKARSH-VISCON, I don't think it is an Anomalib problem. Patchcore is computationally expensive, requiring too much memory, especially during the coreset sampling. As @abc-125 suggested, you could try to reduce the image size to see if it helps a bit.