Output heatmap is zeros

Question

Output heatmap is zeros

Xonxt opened this issue 5 years ago · comments

Hi, I'm trying to train a structure similar to Openpose, but with only two additional stages, no PAF branch, depth-image (from RealSense) as input and only 3 keypoints (I only need both hands and face for my application, so the output is a (batch, h/8, w/8, 3) stack). The training is done with Keras. I've got only around ~5000 training images, but with some extensive augmentation. The ground-truth is appropriately scaled heatmaps with Gaussian peaks in place of keaypoints.

I notice, that the output heatmap is basically always just a matrix of zeros immediately after the first 2-3 iterations. Nothing on it. Up until now I've only had enough patience to let it run for about ~1500 iterations (more than a day) on two GPUs and the loss basically always stays around the same value, with the output being just zeros.

Do you think I might be doing something fundamentally wrong, or do I just have to have enough patience to wait for 300'000+ iterations just like in your original implementation?

By the way, when you talk about iterations, do you mean epochs or epochs*steps_per_epoch?

Zhe Chen · Answer 1 · Wed May 22 2019 22:06:02 GMT+0800 (China Standard Time)

Hi, I am facing a similar issue with almost the same setting as yours. Did you solve the problem after that?

Nikita Kovalenko · Answer 2 · Wed May 22 2019 22:13:18 GMT+0800 (China Standard Time)

Hi, I am facing a similar issue with almost the same setting as yours. Did you solve the problem after that?

Hi,
yeah, I'd switched to the SGD optimizer (using an initial learning rate of 2e-5 and a ReduceLROnPlateau callback), made the Gaussian peaks (for the keypoints) a bit larger, added many more training samples and just generally waited for the training to run a bit longer. I was able to see acceptable results already after a thousand epochs.

My settings:

# optimizer:
sgd = SGD(lr=2e-05, decay=0.0, momentum=0.9, nesterov=False)

# loss function:
def _heat_loss(x,y):
    return K.sum(K.square(x - y)) / 2

# compiling model:
model.compile(optimizer=sgd, loss=_heat_loss)

# callback:
from keras.callbacks import ReduceLROnPlateau
reduce_learning_rate = ReduceLROnPlateau(monitor='loss', factor=gamma, patience=50, verbose=1)

Hope that helps.

Zhe Chen · Answer 3 · Fri May 24 2019 14:24:19 GMT+0800 (China Standard Time)

Hi @Xonxt,

Thank you so much! You saved my day.

I switched from Adam to SGD and reduced the kernel size from 7 to 3 in all convolutional layers. Everything started to work.