Gum-Net Training not improving with demo data

Question

Gum-Net Training not improving with demo data

kaysagit opened this issue 3 years ago · comments

Hi,

I was testing the Gum-Net and for that used the provided demo data set.
After around 30 epochs and around 15 hours of training I stopped it because there is no improvement in the
loss function, please see below the logs of the training procedure.

Before finetuning:
Rotation error: 1.7350925500744534 +/- 0.6650064011311111 Translation error: 8.442523177067761 +/- 3.44293514383784 ----------
Training Iteration 0
4/4 [==============================] - 1784s 404s/step - loss: 0.8216
Training Iteration 1
4/4 [==============================] - 1781s 405s/step - loss: 0.8218
Training Iteration 2
4/4 [==============================] - 1774s 404s/step - loss: 0.8251
Training Iteration 3
4/4 [==============================] - 1788s 406s/step - loss: 0.8274
Training Iteration 4
4/4 [==============================] - 1783s 405s/step - loss: 0.8334
Training Iteration 5
4/4 [==============================] - 1782s 405s/step - loss: 0.8201
Training Iteration 6
4/4 [==============================] - 1777s 405s/step - loss: 0.8250
Training Iteration 7
4/4 [==============================] - 1797s 407s/step - loss: 0.8310
Training Iteration 8
4/4 [==============================] - 1787s 407s/step - loss: 0.8336
Training Iteration 9
4/4 [==============================] - 1784s 406s/step - loss: 0.8207
Training Iteration 10
4/4 [==============================] - 1787s 406s/step - loss: 0.8258
Training Iteration 11
4/4 [==============================] - 1779s 405s/step - loss: 0.8235
Training Iteration 12
4/4 [==============================] - 1784s 406s/step - loss: 0.8296
Training Iteration 13
4/4 [==============================] - 1773s 402s/step - loss: 0.8271
Training Iteration 14
4/4 [==============================] - 1773s 403s/step - loss: 0.8199
Training Iteration 15
4/4 [==============================] - 1785s 406s/step - loss: 0.8315
Training Iteration 16
4/4 [==============================] - 1789s 407s/step - loss: 0.8264
Training Iteration 17
4/4 [==============================] - 1777s 405s/step - loss: 0.8336
Training Iteration 18
4/4 [==============================] - 1774s 403s/step - loss: 0.8299
Training Iteration 19
4/4 [==============================] - 1790s 407s/step - loss: 0.8303
Training Iteration 20
4/4 [==============================] - 1784s 406s/step - loss: 0.8244
Training Iteration 21
4/4 [==============================] - 1786s 407s/step - loss: 0.8242
Training Iteration 22
4/4 [==============================] - 1789s 406s/step - loss: 0.8245
Training Iteration 23
4/4 [==============================] - 1782s 406s/step - loss: 0.8253
Training Iteration 24
4/4 [==============================] - 1789s 405s/step - loss: 0.8258
Training Iteration 25
4/4 [==============================] - 1784s 406s/step - loss: 0.8238
Training Iteration 26
4/4 [==============================] - 1782s 405s/step - loss: 0.8200
Training Iteration 27
4/4 [==============================] - 1779s 405s/step - loss: 0.8282
Training Iteration 28
4/4 [==============================] - 1780s 405s/step - loss: 0.8251
Training Iteration 29
2/4 [==============>...............] - ETA: 19:00 - loss: 0.8142

Do you have any suggestions or explanation why the training with your demo dataset is not working? I did not change the source code.

Kind regards!

xiangruz commented 3 years ago

Discussed

Chier · Answer 1 · Fri Apr 22 2022 08:49:27 GMT+0800 (China Standard Time)

I got a similar result. Could you tell me how to solve it? Thank you.

Chier · Answer 2 · Fri Apr 22 2022 08:56:54 GMT+0800 (China Standard Time)

This is my result:

Epoch 1/1
100/100 [==============================] - 1388s 14s/step - loss: 0.8265
Training Iteration 4
Epoch 1/1
100/100 [==============================] - 1387s 14s/step - loss: 0.8275
Training Iteration 5
Epoch 1/1
100/100 [==============================] - 1388s 14s/step - loss: 0.8268
Training Iteration 6
Epoch 1/1
100/100 [==============================] - 1388s 14s/step - loss: 0.8315
Training Iteration 7
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8324
Training Iteration 8
Epoch 1/1
100/100 [==============================] - 1390s 14s/step - loss: 0.8297
Training Iteration 9
Epoch 1/1
100/100 [==============================] - 1390s 14s/step - loss: 0.8312
Training Iteration 10
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8302
Training Iteration 11
Epoch 1/1
100/100 [==============================] - 1387s 14s/step - loss: 0.8302
Training Iteration 12
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8308
Training Iteration 13
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8249
Training Iteration 14
Epoch 1/1
100/100 [==============================] - 1390s 14s/step - loss: 0.8299
Training Iteration 15
Epoch 1/1
100/100 [==============================] - 1388s 14s/step - loss: 0.8318
Training Iteration 16
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8263
Training Iteration 17
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8288
Training Iteration 18
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8289
Training Iteration 19
Epoch 1/1
100/100 [==============================] - 1389s 14s/step - loss: 0.8290

xiangruz · Answer 3 · Thu Apr 28 2022 06:30:54 GMT+0800 (China Standard Time)

I got a similar result. Could you tell me how to solve it? Thank you.

What we observed is that sometimes on the low SNR dataset (using the pre-trained model), the loss may not be decreasing but if you output the transformation error before and after finetuning, it is improving. Hopefully it helps!