[Bug]: Training of anomalib model on custom dataset is taking too long!
UTKARSH-VISCON opened this issue · comments
Describe the bug
I am trying to train a anomalib model on my custom dataset, but its taking too long to train (even after 3 days there were no results).
I am using the same code as provided in the anomalib docs:
from anomalib.data import Folder
from anomalib.models import Patchcore
from anomalib.engine import Engine
Create the datamodule
datamodule = Folder(
name="hazelnut_toy",
root="datasets/hazelnut_toy",
normal_dir="good",
abnormal_dir="crack",
task="classification",
)
Setup the datamodule
datamodule.setup()
Create the model and engine
model = Patchcore()
engine = Engine(task="classification")
Train a Patchcore model on the given datamodule
engine.train(datamodule=datamodule, model=model)
Output screen (Its just stuck at this):
┌───┬───────────────────────┬──────────
│ │ Name │ Type │ Params │ Mode │
├───┼───────────────────────┼───────────
│ 0 │ model │ PatchcoreModel │ 643 K │ train │
│ 1 │ _transform │ Compose │ 0 │ train │
│ 2 │ normalization_metrics │ MetricCollection │ 0 │ train │
│ 3 │ image_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 4 │ pixel_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 5 │ image_metrics │ AnomalibMetricCollection │ 0 │ train │
│ 6 │ pixel_metrics │ AnomalibMetricCollection │ 0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K
Non-trainable params: 0
Total params: 643 K
Total estimated model params size (MB): 2
Modules in train mode: 15
Modules in eval mode: 46
Dataset
Custom Dataset
Model
PatchCore
Steps to reproduce the behavior
- Installed Anomalib
- Use the anomalib repo from github
- Run the training code on custom dataset.
OS information
OS information:
- OS: [Windows 11]
- Python version: [3.10.0]
- Anomalib version: [1.1.0]
- PyTorch version: [2.2.2]
- CUDA/cuDNN version: [11.8]
- GPU models and configuration: [NVIDIA GeForce RTX 3050 Ti]
- Any other relevant information: [I'm using a custom dataset]
Expected behavior
The model should get trained
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
No response
Configuration YAML
# Import the datamodule
from anomalib.data import Folder
# Create the datamodule
datamodule = Folder(
name="hazelnut_toy",
root="datasets/hazelnut_toy",
normal_dir="good",
abnormal_dir="crack",
task="classification",
)
# Setup the datamodule
datamodule.setup()
Logs
┌───┬───────────────────────┬──────────
│ │ Name │ Type │ Params │ Mode │
├───┼───────────────────────┼───────────
│ 0 │ model │ PatchcoreModel │ 643 K │ train │
│ 1 │ _transform │ Compose │ 0 │ train │
│ 2 │ normalization_metrics │ MetricCollection │ 0 │ train │
│ 3 │ image_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 4 │ pixel_threshold │ F1AdaptiveThreshold │ 0 │ train │
│ 5 │ image_metrics │ AnomalibMetricCollection │ 0 │ train │
│ 6 │ pixel_metrics │ AnomalibMetricCollection │ 0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K
Non-trainable params: 0
Total params: 643 K
Total estimated model params size (MB): 2
Modules in train mode: 15
Modules in eval mode: 46
Code of Conduct
- I agree to follow this project's Code of Conduct
Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.
Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.
I have a total of 90 images in my dataset (900x900 resolution)
Can you try if it works with 256x256? Maybe there is some different problem, especially if the output screen is stuck.
@UTKARSH-VISCON, I don't think it is an Anomalib problem. Patchcore is computationally expensive, requiring too much memory, especially during the coreset sampling. As @abc-125 suggested, you could try to reduce the image size to see if it helps a bit.