Any pretrained models for train set only?

Question

Any pretrained models for train set only?

96lives opened this issue 5 years ago · comments

Do you have any models only trained on the training set (without validation set)?
I cannot reproduce your result due to the lack of hardware, but I want to test something on the validation dataset.
It would be really appreciated if you release your code with only trained on trainset.

Thanks,

Chris Choy · Answer 1 · Fri Nov 15 2019 05:05:32 GMT+0800 (China Standard Time)

Hi,

This is the weights I trained on the ScanNet.v2 official train split with batch_size 10 for 120k iterations. (This did not use all the augmentations that I added on this repository recently, so if you train from scratch using the provided code, you could get a better number.)
https://node1.chrischoy.org/data/publications/minknet/MinkUNet34C-train-conv1-5.pth

Note that the conv1 kernel size is 5, which can be set using --conv1_kernel_size 5.

With no rotation average, it gets: Score: 89.274, mIOU 72.163, mAP 76.032, mAcc 80.167.

The training log is

Chris Choy · Answer 2 · Fri Nov 15 2019 05:28:57 GMT+0800 (China Standard Time)

Ah sorry, there is a problem with the weights. If you load it, the final iteration this weight was trained is 27k not 120k.

The training died at 27k and I resumed training with a shell script that had a bug and the weights were saved in the default path that I overwrite with the trainval training.

The validation mIoU at 27k is 68 and it matches the mIoU on the log, but let me resume the training with the fixed resume script and update the weights. This will take some time.

Dongsu Zhang · Answer 3 · Wed Nov 20 2019 21:20:54 GMT+0800 (China Standard Time)

I tested your trainset only pretrained model, and I don't think it works as expected....
I've tested with the code from the MinkowskiNet/indoor.py (just replacing weights.pth with this pretrained model) and most of the scenes are predicted as wall.

Could you check if this is truely the right model?

Chris Choy · Answer 4 · Wed Nov 20 2019 21:43:31 GMT+0800 (China Standard Time)

Hmm looks pretty normal to me with the indoor.py

I am pretty sure you did not load the weights correctly.

Download the indoor.py that works: indoor.py

python indoor.py --weights ./MinkUNet34C-train-conv1-5.pth --conv1_kernel_size 5

Update on training: it passed 60k.

Dongsu Zhang · Answer 5 · Wed Nov 20 2019 22:31:07 GMT+0800 (China Standard Time)

Thanks, I previously just used models from MinkowkiNet, and that must have been the problem.

Chris Choy · Answer 6 · Tue Dec 31 2019 15:05:32 GMT+0800 (China Standard Time)

After 120k iterations on training set only, without Hue-Saturation data augmentation, I get

Score: 89.145
mIOU 72.219
mAP 75.612
mAcc 80.402

without any rotation average. The weights are available at https://node1.chrischoy.org/data/publications/minknet/MinkUNet34C-train-conv1-5.pth same as before. I overwrote the weights.

Loss 0.125 (AVG: 0.633) Score 96.592 (AVG: 89.145)  mIOU 72.219 mAP 75.612 mAcc 80.402                                                                                                                                                                                                        
IOU: 83.279 94.812 66.017 80.952 91.148 81.674 76.254 61.412 59.067 80.741 29.566 63.364 64.250 75.883 62.040 69.149 92.074 66.763 85.942 59.994
mAP: 79.035 95.880 64.081 75.730 90.594 84.860 73.914 69.861 69.303 73.107 53.554 55.471 70.109 75.624 69.667 85.196 94.587 84.841 81.085 65.735
mAcc: 94.768 98.107 77.437 84.524 95.473 92.140 85.702 73.113 70.879 90.644 36.039 77.042 79.362 81.735 67.309 75.389 94.378 77.455 92.376 64.167

Dongsu Zhang · Answer 7 · Tue Dec 31 2019 16:09:51 GMT+0800 (China Standard Time)

Thank you!!

Wu Xiaodong · Answer 8 · Sun Nov 28 2021 10:22:11 GMT+0800 (China Standard Time)

Hi,

This is the weights I trained on the ScanNet.v2 official train split with batch_size 10 for 120k iterations. (This did not use all the augmentations that I added on this repository recently, so if you train from scratch using the provided code, you could get a better number.) https://node1.chrischoy.org/data/publications/minknet/MinkUNet34C-train-conv1-5.pth

Note that the conv1 kernel size is 5, which can be set using --conv1_kernel_size 5.

With no rotation average, it gets: Score: 89.274, mIOU 72.163, mAP 76.032, mAcc 80.167.

The training log is

@chrischoy
Hi, do you know why the validation loss increase after training around 30k iterations? And why does the validation loss increase while the validation mIoU also increases steadily?

suyunzzz · Answer 9 · Sun Mar 20 2022 15:31:41 GMT+0800 (China Standard Time)

i use MinkowskiEngine v0.5.4 to train scananet，there are some logs, i dont know where is the reason.

 25%|██▍       | 75/301 [01:06<04:36,  1.23s/it]xiaokeai1-Z10PE-D8-WS 03/20 15:24:28 [train.py 155] ===> Epoch[4](1280/301): Loss 0.7165      LR: 9.904e-02   Score nan       Data time: 0.1023, Total iter time: 0.7513
xiaokeai1-Z10PE-D8-WS 03/20 15:24:28 [x2num.py 14] NaN or Inf found in input tensor.
 38%|███▊      | 115/301 [01:39<02:25,  1.28it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:25:01 [train.py 155] ===> Epoch[4](1320/301): Loss 0.6117     LR: 9.901e-02   Score nan       Data time: 0.0923, Total iter time: 0.7172
xiaokeai1-Z10PE-D8-WS 03/20 15:25:01 [x2num.py 14] NaN or Inf found in input tensor.
 51%|█████▏    | 155/301 [02:13<02:19,  1.04it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:25:34 [train.py 155] ===> Epoch[4](1360/301): Loss 0.6686     LR: 9.898e-02   Score nan       Data time: 0.0714, Total iter time: 0.6861
xiaokeai1-Z10PE-D8-WS 03/20 15:25:34 [x2num.py 14] NaN or Inf found in input tensor.
 65%|██████▍   | 195/301 [02:45<01:40,  1.05it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:26:07 [train.py 155] ===> Epoch[4](1400/301): Loss 0.6486     LR: 9.895e-02   Score nan       Data time: 0.0870, Total iter time: 0.7185
xiaokeai1-Z10PE-D8-WS 03/20 15:26:07 [x2num.py 14] NaN or Inf found in input tensor.
 78%|███████▊  | 235/301 [03:20<00:57,  1.14it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:26:42 [train.py 155] ===> Epoch[4](1440/301): Loss 0.5743     LR: 9.892e-02   Score nan       Data time: 0.0684, Total iter time: 0.6631
xiaokeai1-Z10PE-D8-WS 03/20 15:26:42 [x2num.py 14] NaN or Inf found in input tensor.
 91%|█████████▏| 275/301 [03:55<00:24,  1.08it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:27:17 [train.py 155] ===> Epoch[4](1480/301): Loss 0.8295     LR: 9.889e-02   Score nan       Data time: 0.0765, Total iter time: 0.7041
xiaokeai1-Z10PE-D8-WS 03/20 15:27:17 [x2num.py 14] NaN or Inf found in input tensor.
 98%|█████████▊| 295/301 [04:13<00:04,  1.21it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:27:35 [loss_builder.py 21] ===> using CrossEntropyLoss
xiaokeai1-Z10PE-D8-WS 03/20 15:27:35 [test.py 71] ===> Start testing
 98%|█████████▊| 296/301 [04:14<00:04,  1.14it/s]xiaokeai1-Z10PE-D8-WS 03/20 15:29:36 [test.py 50] 101/156: Data time: 0.0033, Iter time: 0.2327      Loss 0.767 (AVG: 0.509) Score 84.272 (AVG: nan)       mIOU 13.888 mAP 21.319 mAcc 19.236
IOU: 57.648 93.750 13.339 0.381 52.412 0.015 48.899 0.026 1.813 5.766 0.000 0.000 3.188 0.018 0.000 0.000 0.000 0.000 0.000 0.500
mAP: 41.651 62.139 24.649 10.320 39.675 16.823 50.905 19.029 17.082 40.510 0.794 16.221 25.673 27.179 2.356 3.322 11.255 6.191 4.431 6.178
mAcc: 96.176 99.484 30.268 0.398 78.839 0.015 66.431 0.026 2.155 5.841 0.000 0.000 4.542 0.021 0.000 0.000 0.000 0.000 0.000 0.519