Training on a custom dataset

Question

Training on a custom dataset

JaWeyl opened this issue a year ago · comments

Hi all,

I'd like to run your approach on a custom dataset that contains images (1024 x 1024) from agriculutral fields captured by an UAV. Our task is to detect all plant instances in the field, which might be difficult due to overlapping instances.

We implemented a custom dataset parser and trained multiple models based on your approach. However, in tensorboard the results look not very promising so far since instances are not well seperated.

Currently I try to overfit to a set of 24 images (same images for train, val, and test), where we provide all instances (but no background). I think this model should work first before we try to reduce the number of annotated instances.

We trained our first model based on the following setup:

python spoco_train.py \
    --spoco \
    --ds-name custom \
    --ds-path /export/data/SSIS/datasets/mydataset \
    --instance-ratio 0.4 \
    --batch-size 8  \
    --model-name UNet2D \
    --model-feature-maps 16 32 64 128 256 512 \
    --model-out-channels 8 \
    --learning-rate 0.001 \
    --weight-decay 0.00001 \
    --cos \
    --loss-delta-var 0.5 \
    --loss-delta-dist 2.0 \
    --loss-unlabeled-push 1.0 \
    --loss-instance-weight 1.0 \
    --loss-consistency-weight 1.0 \
    --kernel-threshold 0.5 \
    --checkpoint-dir /export/data/ckpts \
    --log-after-iters 500  --max-num-iterations 90000

Please note that the instance-ratio argument is ignored in our parser. Here are some results based on tensorboard:

Since the results look not great so far we trained another model based on the following setup:

python spoco_train.py \
    --spoco \
    --ds-name custom \
    --ds-path /export/data/SSIS/datasets/mydataset \
    --instance-ratio 0.1 \
    --batch-size 6 \
    --model-name UNet2D \
    --model-feature-maps 16 32 64 128 256 512 \
    --learning-rate 0.0002 \
    --weight-decay 0.00001 \
    --cos \
    --loss-delta-var 0.5 \
    --loss-delta-dist 2.0 \
    --loss-unlabeled-push 1.0 \
    --loss-instance-weight 1.0 \
    --loss-consistency-weight 1.0 \
    --kernel-threshold 0.5 \
    --checkpoint-dir /export/data/ckpts \
    --log-after-iters 256 --max-num-iterations 80000

Again here are some results based on tensorboard:

Unfortunately, in the RGB visualuzation the instaces are not well detected.

Thus, we gave it a try with a slightly different setup by setting the kernel threshold to 0.9 as following:

python spoco_train.py \
    --spoco \
    --ds-name custom \
    --ds-path /export/data/SSIS/datasets/mydataset \
    --instance-ratio 0.1 \
    --batch-size 4 \
    --model-name UNet2D \
    --model-feature-maps 16 32 64 128 256 512 \
    --learning-rate 0.0002 \
    --weight-decay 0.00001 \
    --cos \
    --loss-delta-var 0.5 \
    --loss-delta-dist 2.0 \
    --loss-unlabeled-push 1.0 \
    --loss-instance-weight 1.0 \
    --loss-consistency-weight 1.0 \
    --kernel-threshold 0.9 \
    --checkpoint-dir /export/data/ckpts \
    --log-after-iters 256 --max-num-iterations 80000?

Here the results look already better but there are still some artifacts in the background:

Initially, we thought that something might be wrong in our dataset parser. Thus, we converted the CVPPP dataset into our custom format, passed it to our custom parser, and trained a model based on your approach. However, here the results look quite good. Consequently, the custom dataset parser should be fine.

Based on your experience with the model architecture - is there any hyperparamter that you would suggest to change to improve the overall performance? I'd appreciate any comments on this.

Adrian Wolny · Answer 1 · Thu Apr 06 2023 21:41:21 GMT+0800 (China Standard Time)

Dear @JaWeyl,

thank you for your interest in our project. You said that you have 24 images with full ground truth, right? Then first of all I would suggest training in a fully supervised fashion first. This can be achieved by removing the --spoco flag (and also setting the unlabeled push and consistency weights to 0). Also please use a lower learning rate for this type of images (e.g. 0.0002):

python spoco_train.py \
    --ds-name custom \
    --ds-path /export/data/SSIS/datasets/mydataset \
    --batch-size 8  \
    --model-name UNet2D \
    --model-feature-maps 16 32 64 128 256 512 \
    --model-out-channels 8 \
    --learning-rate 0.0002 \
    --weight-decay 0.00001 \
    --cos \
    --loss-delta-var 0.5 \
    --loss-delta-dist 2.0 \
    --loss-unlabeled-push 0.0 \
    --loss-instance-weight 1.0 \
    --loss-consistency-weight 0.0 \
    --kernel-threshold 0.5 \
    --checkpoint-dir /export/data/ckpts \
    --log-after-iters 500  --max-num-iterations 90000

Let's see if the network is able to learn from in this configuration first.

As for the sparse training later on, I'd suggest lowering the consistency weight significantly to the value of 0.1 or even 0.01. The consistency side loss is a really strong regularizer, which might cause the network to underfit (this is something that seems to be happening in your case, where the background is collapsing nicely, but also the instances are smeared out instead of being separated in the embedding space). So for the sparse setting, I'd recommend something like:

python spoco_train.py \
    --spoco \
    --ds-name custom \
    --ds-path /export/data/SSIS/datasets/mydataset \
    --batch-size 8  \
    --model-name UNet2D \
    --model-feature-maps 16 32 64 128 256 512 \
    --model-out-channels 8 \
    --learning-rate 0.0002 \
    --weight-decay 0.00001 \
    --cos \
    --loss-delta-var 0.5 \
    --loss-delta-dist 2.0 \
    --loss-unlabeled-push 1.0 \
    --loss-instance-weight 1.0 \
    --loss-consistency-weight 0.01 \
    --kernel-threshold 0.5 \
    --checkpoint-dir /export/data/ckpts \
    --log-after-iters 500  --max-num-iterations 90000

Good luck!