chenhaoxing / DiffusionInst

This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training the model on custom dataset

huaweiping opened this issue · comments

Hi, I have a custom 512x512 dataset with 2 channels (set the third one to zero) and I want to train the model with that dataset. The dataset is a coco-like dataset and validated by detectron2. Everything looks fine except the training result. This is the result after 45000 iterations.:

[07/04 10:44:50 d2.engine.defaults]: Evaluation results for my_dataset_val in csv format:
[07/04 10:44:50 d2.evaluation.testing]: copypaste: Task: bbox
[07/04 10:44:50 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[07/04 10:44:50 d2.evaluation.testing]: copypaste: 10.3947,25.5441,8.5027,10.7150,8.7438,nan
[07/04 10:44:50 d2.evaluation.testing]: copypaste: Task: segm
[07/04 10:44:50 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[07/04 10:44:50 d2.evaluation.testing]: copypaste: 3.2318,7.6850,1.9359,2.9650,7.0297,nan

The dataset only contain 2 classes so I modify the num_classes in the diffinst.coco.res50.yaml file and use diffinst.coco.res50.inst.yaml as the Instance Segmentation configuration file. This is the diffinst.coco.res50.yaml file:

_BASE_: "Base-DiffusionInst.yaml"
MODEL:
  WEIGHTS: 
  RESNETS:
    DEPTH: 50
    STRIDE_IN_1X1: False
  DiffusionInst:
    NUM_PROPOSALS: 500
    NUM_CLASSES: 2
DATASETS:
  TRAIN: ("my_dataset_train",)
  TEST:  ("my_dataset_val",)
SOLVER:
  STEPS: (350000, 420000) #(87500, 105000) #(350000, 420000)
  MAX_ITER: 450000 #112500 #450000
INPUT:
  CROP:
    ENABLED: True
  FORMAT: "RGB"

The image crop size is modified to 512x512 in the base config file. I also abadon the pre-trained weights as this dataset is far from generall objects in either ImageNet or Coco dataset. Nothing else is significantly changed in the code or configuration file.

I believe I did something wrong but have no idea of that. Can anyone help me figure it out?

Hi, @huaweiping
I think there might be three things you can do:

  1. try to visualize your prediction; it is the first thing I do when the performance is unreasonable.
  2. try to tune the learning rate; lrx10 or lr/10 is preferred for debug.
  3. maybe you can load the pretrained weights and give it a try.

Hi @zhangxgu

Thanks so much for you reply! I just got some comparison and the prediction, which looks a bit weird to me.

Here's the groundtruth:
groundtruth

And this is the prediction:
prediction

The dataset is about the ocean eddy. I check the code and it seems the percentage figure above the bounding box indicates the score. Between two figures, only one object at the bottom-left of the prediction is correctly labeled as what it is supposed to be in the groundtruth.

The rest predicted features in the bottom looks fine for the bouding box but I don't find segmentation masks there. Some predicted features on the top-region are missing but this happens in other architecture so probably fine.

Do I misunderstand the meaning of the score or should I tweak the num_proposal in the configuration file?

I'll try tweak the learning rate and see if that will be helpful.

Thanks