The loss becomes negative

Question

The loss becomes negative

FiReTiTi opened this issue 8 years ago · comments

Hi,
I jut ran a CNN built with Keras on a big training set, and I has weird loss values at each epoch (see below):
66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

345664/511502 [===================>..........] - ETA: 23s - loss: 8.3174
345696/511502 [===================>..........] - ETA: 23s - loss: -39342531075525840.0000

214080/511502 [===========>..................] - ETA: 41s - loss: 8.3406
214112/511502 [===========>..................] - ETA: 41s - loss: -63520753730220536.0000

How is that possible? The loss becomes suddenly to big and the value gets bigger than the double encoding?
Is there a way to avoid it?

Regards,

FiReTiTi commented 6 years ago

Thanks.

Michael Oliver · Answer 1 · Tue Mar 08 2016 10:08:18 GMT+0800 (China Standard Time)

Do you really think that's enough information for anyone to be able to answer your question?

François Chollet · Answer 2 · Tue Mar 08 2016 11:14:09 GMT+0800 (China Standard Time)

The loss is just a scalar that you are trying to minimize. It's not supposed to be positive! For instance a cosine proximity loss will usually be negative (trying to make proximity as high as possible by minimizing a negative scalar).

FiReTiTi · Answer 3 · Tue Mar 08 2016 13:35:06 GMT+0800 (China Standard Time)

Hi, thank you for your answers.
The last layer is a dense layer with a sigmoid, so the value should not be negative:
model.add(Dense(1)) model.add(Activation('sigmoid'))
What really surprise me is that from one batch to the next, there is such a fall.
@the-moliver: which information would you need?

Yiming Cui · Answer 4 · Tue Mar 08 2016 15:43:43 GMT+0800 (China Standard Time)

What is your training objective, binary_crossentropy or others?
And if you are using GPU to train your network, the datatype should be in float32 (theano restriction).

FiReTiTi · Answer 5 · Tue Mar 08 2016 16:06:09 GMT+0800 (China Standard Time)

Hi, thank you for your help.
Yes, the training objective is binary_crossentropy.
And yes, all my data are already float32, I made sure of that.

Yiming Cui · Answer 6 · Tue Mar 08 2016 16:24:23 GMT+0800 (China Standard Time)

@FiReTiTi
I don't think binary_crossentropy could return the negative values.
In Theano backend,

crossentropy(t,o) = -(t*log(o) + (1 - t)*log(1 - o)).

t and o are all in range of [0,1], making the whole equation non-negative.
your output log more seems like an overflow.
Maybe you can check on your data ? or you can extract a proportion of data to test if the model still gives the negative loss.

66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

FiReTiTi · Answer 7 · Wed Mar 09 2016 02:44:09 GMT+0800 (China Standard Time)

If the loss cannot be negative, then does it mean it goes over the encoding limits and then it loops back into the negative values?

FiReTiTi · Answer 8 · Sat Mar 12 2016 03:02:44 GMT+0800 (China Standard Time)

I am still working on the same data, and here is an other weird thing:
Epoch 140/3000
5s - loss: 0.5968 - val_loss: 0.4191
Epoch 141/3000
5s - loss: 0.5974 - val_loss: 0.4556
Epoch 142/3000
5s - loss: 0.5979 - val_loss: 0.4382
Epoch 143/3000
5s - loss: 6.0467 - val_loss: 11.1324
Epoch 144/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 145/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 146/3000
5s - loss: 7.7176 - val_loss: 11.1324
And nothing changes during the next 2850 epochs, perfectly identical.

Philippe Rémy · Answer 9 · Sat Mar 12 2016 16:03:00 GMT+0800 (China Standard Time)

@FiReTiTi please give more information about your model if you want help on that.

In your last case, your optimiser is likely to be stuck in a local minima. That could explain why it remains identical during all your next iterations.

FiReTiTi · Answer 10 · Sat Mar 12 2016 16:13:12 GMT+0800 (China Standard Time)

Here is the model:
model = Sequential()
model.add(Convolution2D(8, 7, 7, border_mode='valid', input_shape=(1, 31, 31), activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(16, 5, 5, border_mode='valid', activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3, border_mode='valid', activation='relu'))
model.add(Convolution2D(65, 1, 1, border_mode='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(23))
model.add(Activation('tanh'))
model.add(Dropout(0.1))
model.add(Dense(11))
model.add(Activation('sigmoid'))
model.add(Dropout(0.1))
model.add(Dense(1))
model.add(Activation('sigmoid'))
optimizer = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
model.fit(dataset, labels, batch_size=batch_size, nb_epoch=nb_epoch, shuffle=True, validation_split=0.1, verbose=2)

The dataset contains 70 000 images of size 31x31.
What I don't understand is why there is this sudden jump of the loss and then the loss is stuck.

SunshineAtNoon · Answer 11 · Mon Mar 21 2016 16:35:59 GMT+0800 (China Standard Time)

@FiReTiTi May I ask how did you solve the negative loss problem? I ran into the same problem, my loss is a customed loss with a bunch of mse, so it shouldn't be negative either. It looks like this:

Epoch 1/15
   32/33102 [..............................] - ETA: 15552s - loss: -88028794.750
   48/33102 [..............................] - ETA: 11176s - loss: -1419246161.8
   64/33102 [..............................] - ETA: 8987s - loss: -13590295485.3
   80/33102 [..............................] - ETA: 7674s - loss: -107586018455.
   96/33102 [..............................] - ETA: 6797s - loss: -661847078867.
  112/33102 [..............................] - ETA: 6172s - loss: -3960883097561
  128/33102 [..............................] - ETA: 5702s - loss: -3047999712303
  144/33102 [..............................] - ETA: 5337s - loss: -2318531227797
  320/33102 [..............................] - ETA: 3720s - loss: -1825597231654244712448.0000

FiReTiTi · Answer 12 · Tue Mar 22 2016 01:23:39 GMT+0800 (China Standard Time)

I haven't, it still happen time to time :-(
Use smaller NNs seems to reduce the phenomenon.

SunshineAtNoon · Answer 13 · Tue Mar 22 2016 09:16:37 GMT+0800 (China Standard Time)

@FiReTiTi Thanks for your reply. I found my problem, I use a custom loss and accidentally put y_pred and y_true in the wrong order when passing them to my loss function, so maybe it's not the same reason in your case.

Philippe Rémy · Answer 14 · Tue Mar 22 2016 09:18:00 GMT+0800 (China Standard Time)

@sunshineatnoon so you were using a non symmetric loss function like the cross entropy?

SunshineAtNoon · Answer 15 · Tue Mar 22 2016 09:20:00 GMT+0800 (China Standard Time)

Here is my loss function, it's a bunch of squared values.

FiReTiTi · Answer 16 · Tue Mar 22 2016 10:08:19 GMT+0800 (China Standard Time)

Cool for you!
It happened to me with a binary_crossentropy :-(

SunshineAtNoon · Answer 17 · Tue Mar 22 2016 10:15:21 GMT+0800 (China Standard Time)

@FiReTiTi In that case, I think it's more likely an overflow. Do you use Theano? Maybe you can try NanGuardMode in theano to see if it gives you any errors or warnings. I googled a lot last night and found that Nans or Infs might cause this kind of error. Such as this one

FiReTiTi · Answer 18 · Tue Mar 22 2016 10:29:21 GMT+0800 (China Standard Time)

That's also my opinion. Thanks for the tips, I will test them when it occurs again.

zahrasorour · Answer 19 · Tue Nov 15 2016 01:40:12 GMT+0800 (China Standard Time)

My loss is negative, what does that mean? I am using tensorflow backend.

Epoch 1/10
2536/2536 [==============================] - 584s - loss: -7.7728 - acc: 0.2492 - val_loss: -7.9712 - val_acc: 0.2500

My code is here for reference:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator

import numpy as np

model = Sequential()

model.add(Convolution2D(3, 3, 32, border_mode='valid', dim_ordering='tf', input_shape=(150, 200, 3)))
model.add(Activation('relu'))
model.add(Convolution2D(3, 3, 32))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(1))
model.add(Activation('sigmoid'))

train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'Data/Train', # this is the target directory
target_size=(150, 200), # all images will be resized to 150x150
batch_size=32,
class_mode='binary') # since we use binary_crossentropy loss, we need binary labels

validation_generator = test_datagen.flow_from_directory(
'Data/Validation',
target_size=(150, 200),
batch_size=32,
class_mode='binary')

model.fit_generator(train_generator, samples_per_epoch=2536, nb_epoch=10, validation_data=validation_generator, nb_val_samples=800)

model.save_weights('thesis.h5')

hsiao yi · Answer 20 · Mon Nov 27 2017 12:23:10 GMT+0800 (China Standard Time)

@zach-nervana FYI
Some possible reason is listed in this stackoverflow question.

HuangBo-Terraloupe · Answer 21 · Mon Mar 26 2018 22:03:45 GMT+0800 (China Standard Time)

Hello everyone,

As we all know, the kld loss can not be negative, I am training a regression model, and get negative values.
Here is my model:

model

base_model = VGG16(input_shape=(360, 480, 3), weights='imagenet', include_top=False)
x = base_model.layers[-2].output
x = MaxPool2D(pool_size=(2, 2), padding='same', strides=(1, 1), name='block5_pool')(x)
x = Conv2D(32, (7, 7), activation='relu', padding='same', name='block5_conv5')(x)
x = Conv2D(8, (7, 7), activation='relu', padding='same', name='block5_conv6')(x)
x = Conv2D(1, (7, 7), activation='relu', padding='same', name='block5_conv7')(x)
x = Flatten(name='flatten')(x)
prediction = Activation('softmax')(x) # problem come in!!!!!!!!!!!
model = Model(inputs=base_model.input, outputs=prediction)

compile

adam = Adam(lr=1e-5, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0)
model.compile(optimizer=adam, loss='kld', metrics=['accuracy'])

The problem is, if I add a softmax layer at end of the model, the loss is positive, which is fine, but the loss is around 32, it is really big. But if I remove the softmax layer, the loss becomes negative.

For the input and output, input are images, I normalize the images to 0-1, and labels also 0-1.
My point is, this is a regression model, I do not want to add a softmax layer at end of the model, but the loss becomes negative, which is not right. Is there someone has a idea? How to solve the problem?

Sarah_hu · Answer 22 · Tue Mar 27 2018 10:07:22 GMT+0800 (China Standard Time)

@FiReTiTi Did you solve your problem? I had a similar problem. I used theano as backend, and the loss function is binary_crossentropy, during the training, the acc, val_acc, loss, and val_loss never changed in every epoch, and loss value is very high , about 8. I used 4000 training samples 1000 validation samples
this is my model:

`inputs_x=Input(shape=(1,65,21))
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(16,(5,5),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Dropout(0.55)(x)
x=Flatten()(x)

inputs_y=Input(shape=(1,32,21))
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(8,(4,4),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Dropout(0.60)(y)
y=Flatten()(y)

merged_input=keras.layers.concatenate([x,y],axis=-1)

z=Dense(16,activation='softmax')(merged_input)
z=Dense(8,activation='softmax')(z)
z=Dense(4,activation='softmax')(z)

outp=Dense(1,activation='softmax')(z)

model=Model(inputs=[inputs_x,inputs_y],outputs=outp)
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32,
epochs=30,validation_split=0.2,shuffle=True)`

Any ideas for this problem?

FiReTiTi · Answer 23 · Tue Mar 27 2018 14:51:56 GMT+0800 (China Standard Time)

No. It looks like an overflow problem that did not happen when I reduced the size of my model.

Have you tried to switch for TensorFlow as backend. Things seems to be more stable for me since I use TensorFlow.

Sarah_hu · Answer 24 · Wed Mar 28 2018 13:05:41 GMT+0800 (China Standard Time)

Ok, I will try switching the backend. Thanks

Fabio Capela · Answer 25 · Mon Jul 09 2018 20:45:14 GMT+0800 (China Standard Time)

@FiReTiTi Did you try to normalize your input? Non appropriate normalization of the input may lead to a gradient explosion problem.

FiReTiTi · Answer 26 · Tue Jul 10 2018 00:23:24 GMT+0800 (China Standard Time)

@fregocap Yes, the input are normalized.

Andrey Zharkov · Answer 27 · Fri Sep 21 2018 23:01:02 GMT+0800 (China Standard Time)

I had the same problem with negative loss binary crossentropy.
My model ended with

model.add(Dense(1))
model.add(Activation('sigmoid'))

The problem in my case was that the outputs given by generator were not 0 and 1 but several classes (0, 1, 2, ... 6) instead. The model unexpectedly did not fail but provided negative loss.

The solution is to use Dense(n_classes, activation='softmax')
Just be careful with what you are doing

Corey James Levinson · Answer 28 · Sat Jan 05 2019 02:54:30 GMT+0800 (China Standard Time)

When binary cross entropy predictions are negative, it is because the true values are not [0,1]. In my case I was using [-1,1]. The model does not fail, but produces negative value.

Shashank Shekhar · Answer 29 · Thu Oct 03 2019 19:40:22 GMT+0800 (China Standard Time)

I got the negative loss, when i training autoencoder on image data and normalize the images to 0 mean and 1 std (half of data value is -ve) and using binary_crossentropy loss. Later i figure out, this is happening because of binary_crossentropy loss work as regression loss when the input is between 0 and 1, but in my case inputs are also -ve.
http://neuralnetworksanddeeplearning.com/chap3.html

Hamed Mozaffari · Answer 30 · Fri Jan 10 2020 02:07:40 GMT+0800 (China Standard Time)

The answer is easy in my opinion. Your data are not between 0 and 1 and they are between 0 and 255. Just add a "/ 255" on your ground truth data and results will be positive.

Sambath Parthasarathy · Answer 31 · Tue Apr 28 2020 07:53:42 GMT+0800 (China Standard Time)

Thanks Hamed. You are right

ierhon · Answer 32 · Mon Feb 12 2024 01:24:44 GMT+0800 (China Standard Time)

Quite a long time ago I also found this issue, fixed it by changing the optimizer from Adam back to the default RMSprop.

ierhon · Answer 33 · Mon Feb 12 2024 01:31:50 GMT+0800 (China Standard Time)

I think it can also be the result of a high learning rate in some cases, the weights might become too large for tensorflow to work properly. Sometimes when I have the loss growing, I try decreasing the learning rate and it works.