keras-team / keras

Deep Learning for humans

Home Page:http://keras.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The loss becomes negative

FiReTiTi opened this issue · comments

Hi,
I jut ran a CNN built with Keras on a big training set, and I has weird loss values at each epoch (see below):
66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

345664/511502 [===================>..........] - ETA: 23s - loss: 8.3174
345696/511502 [===================>..........] - ETA: 23s - loss: -39342531075525840.0000

214080/511502 [===========>..................] - ETA: 41s - loss: 8.3406
214112/511502 [===========>..................] - ETA: 41s - loss: -63520753730220536.0000

How is that possible? The loss becomes suddenly to big and the value gets bigger than the double encoding?
Is there a way to avoid it?

Regards,

Do you really think that's enough information for anyone to be able to answer your question?

The loss is just a scalar that you are trying to minimize. It's not supposed to be positive! For instance a cosine proximity loss will usually be negative (trying to make proximity as high as possible by minimizing a negative scalar).

Hi, thank you for your answers.
The last layer is a dense layer with a sigmoid, so the value should not be negative:
model.add(Dense(1)) model.add(Activation('sigmoid'))
What really surprise me is that from one batch to the next, there is such a fall.
@the-moliver: which information would you need?

What is your training objective, binary_crossentropy or others?
And if you are using GPU to train your network, the datatype should be in float32 (theano restriction).

Hi, thank you for your help.
Yes, the training objective is binary_crossentropy.
And yes, all my data are already float32, I made sure of that.

@FiReTiTi
I don't think binary_crossentropy could return the negative values.
In Theano backend,

crossentropy(t,o) = -(t*log(o) + (1 - t)*log(1 - o)).

t and o are all in range of [0,1], making the whole equation non-negative.
your output log more seems like an overflow.
Maybe you can check on your data ? or you can extract a proportion of data to test if the model still gives the negative loss.

66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

If the loss cannot be negative, then does it mean it goes over the encoding limits and then it loops back into the negative values?

I am still working on the same data, and here is an other weird thing:
Epoch 140/3000
5s - loss: 0.5968 - val_loss: 0.4191
Epoch 141/3000
5s - loss: 0.5974 - val_loss: 0.4556
Epoch 142/3000
5s - loss: 0.5979 - val_loss: 0.4382
Epoch 143/3000
5s - loss: 6.0467 - val_loss: 11.1324
Epoch 144/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 145/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 146/3000
5s - loss: 7.7176 - val_loss: 11.1324
And nothing changes during the next 2850 epochs, perfectly identical.

@FiReTiTi please give more information about your model if you want help on that.

In your last case, your optimiser is likely to be stuck in a local minima. That could explain why it remains identical during all your next iterations.

Here is the model:
model = Sequential()
model.add(Convolution2D(8, 7, 7, border_mode='valid', input_shape=(1, 31, 31), activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(16, 5, 5, border_mode='valid', activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3, border_mode='valid', activation='relu'))
model.add(Convolution2D(65, 1, 1, border_mode='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(23))
model.add(Activation('tanh'))
model.add(Dropout(0.1))
model.add(Dense(11))
model.add(Activation('sigmoid'))
model.add(Dropout(0.1))
model.add(Dense(1))
model.add(Activation('sigmoid'))
optimizer = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
model.fit(dataset, labels, batch_size=batch_size, nb_epoch=nb_epoch, shuffle=True, validation_split=0.1, verbose=2)

The dataset contains 70 000 images of size 31x31.
What I don't understand is why there is this sudden jump of the loss and then the loss is stuck.

@FiReTiTi May I ask how did you solve the negative loss problem? I ran into the same problem, my loss is a customed loss with a bunch of mse, so it shouldn't be negative either. It looks like this:

Epoch 1/15
   32/33102 [..............................] - ETA: 15552s - loss: -88028794.750
   48/33102 [..............................] - ETA: 11176s - loss: -1419246161.8
   64/33102 [..............................] - ETA: 8987s - loss: -13590295485.3
   80/33102 [..............................] - ETA: 7674s - loss: -107586018455.
   96/33102 [..............................] - ETA: 6797s - loss: -661847078867.
  112/33102 [..............................] - ETA: 6172s - loss: -3960883097561
  128/33102 [..............................] - ETA: 5702s - loss: -3047999712303
  144/33102 [..............................] - ETA: 5337s - loss: -2318531227797
  320/33102 [..............................] - ETA: 3720s - loss: -1825597231654244712448.0000

I haven't, it still happen time to time :-(
Use smaller NNs seems to reduce the phenomenon.

@FiReTiTi Thanks for your reply. I found my problem, I use a custom loss and accidentally put y_pred and y_true in the wrong order when passing them to my loss function, so maybe it's not the same reason in your case.

@sunshineatnoon so you were using a non symmetric loss function like the cross entropy?

Here is my loss function, it's a bunch of squared values.

Cool for you!
It happened to me with a binary_crossentropy :-(

@FiReTiTi In that case, I think it's more likely an overflow. Do you use Theano? Maybe you can try NanGuardMode in theano to see if it gives you any errors or warnings. I googled a lot last night and found that Nans or Infs might cause this kind of error. Such as this one

That's also my opinion. Thanks for the tips, I will test them when it occurs again.

My loss is negative, what does that mean? I am using tensorflow backend.

Epoch 1/10
2536/2536 [==============================] - 584s - loss: -7.7728 - acc: 0.2492 - val_loss: -7.9712 - val_acc: 0.2500

My code is here for reference:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator

import numpy as np

model = Sequential()

model.add(Convolution2D(3, 3, 32, border_mode='valid', dim_ordering='tf', input_shape=(150, 200, 3)))
model.add(Activation('relu'))
model.add(Convolution2D(3, 3, 32))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(1))
model.add(Activation('sigmoid'))

train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'Data/Train', # this is the target directory
target_size=(150, 200), # all images will be resized to 150x150
batch_size=32,
class_mode='binary') # since we use binary_crossentropy loss, we need binary labels

validation_generator = test_datagen.flow_from_directory(
'Data/Validation',
target_size=(150, 200),
batch_size=32,
class_mode='binary')

model.fit_generator(train_generator, samples_per_epoch=2536, nb_epoch=10, validation_data=validation_generator, nb_val_samples=800)

model.save_weights('thesis.h5')

@zach-nervana FYI
Some possible reason is listed in this stackoverflow question.

Hello everyone,

As we all know, the kld loss can not be negative, I am training a regression model, and get negative values.
Here is my model:

model

base_model = VGG16(input_shape=(360, 480, 3), weights='imagenet', include_top=False)
x = base_model.layers[-2].output
x = MaxPool2D(pool_size=(2, 2), padding='same', strides=(1, 1), name='block5_pool')(x)
x = Conv2D(32, (7, 7), activation='relu', padding='same', name='block5_conv5')(x)
x = Conv2D(8, (7, 7), activation='relu', padding='same', name='block5_conv6')(x)
x = Conv2D(1, (7, 7), activation='relu', padding='same', name='block5_conv7')(x)
x = Flatten(name='flatten')(x)
prediction = Activation('softmax')(x) # problem come in!!!!!!!!!!!
model = Model(inputs=base_model.input, outputs=prediction)

compile

adam = Adam(lr=1e-5, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0)
model.compile(optimizer=adam, loss='kld', metrics=['accuracy'])

The problem is, if I add a softmax layer at end of the model, the loss is positive, which is fine, but the loss is around 32, it is really big. But if I remove the softmax layer, the loss becomes negative.

For the input and output, input are images, I normalize the images to 0-1, and labels also 0-1.
My point is, this is a regression model, I do not want to add a softmax layer at end of the model, but the loss becomes negative, which is not right. Is there someone has a idea? How to solve the problem?

@FiReTiTi Did you solve your problem? I had a similar problem. I used theano as backend, and the loss function is binary_crossentropy, during the training, the acc, val_acc, loss, and val_loss never changed in every epoch, and loss value is very high , about 8. I used 4000 training samples 1000 validation samples
this is my model:

`inputs_x=Input(shape=(1,65,21))
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(16,(5,5),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Dropout(0.55)(x)
x=Flatten()(x)

inputs_y=Input(shape=(1,32,21))
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(8,(4,4),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Dropout(0.60)(y)
y=Flatten()(y)

merged_input=keras.layers.concatenate([x,y],axis=-1)

z=Dense(16,activation='softmax')(merged_input)
z=Dense(8,activation='softmax')(z)
z=Dense(4,activation='softmax')(z)

outp=Dense(1,activation='softmax')(z)

model=Model(inputs=[inputs_x,inputs_y],outputs=outp)
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32,
epochs=30,validation_split=0.2,shuffle=True)`

Any ideas for this problem?

No. It looks like an overflow problem that did not happen when I reduced the size of my model.

Have you tried to switch for TensorFlow as backend. Things seems to be more stable for me since I use TensorFlow.

Ok, I will try switching the backend. Thanks

@FiReTiTi Did you try to normalize your input? Non appropriate normalization of the input may lead to a gradient explosion problem.

@fregocap Yes, the input are normalized.

I had the same problem with negative loss binary crossentropy.
My model ended with

model.add(Dense(1))
model.add(Activation('sigmoid'))

The problem in my case was that the outputs given by generator were not 0 and 1 but several classes (0, 1, 2, ... 6) instead. The model unexpectedly did not fail but provided negative loss.

The solution is to use Dense(n_classes, activation='softmax')
Just be careful with what you are doing

When binary cross entropy predictions are negative, it is because the true values are not [0,1]. In my case I was using [-1,1]. The model does not fail, but produces negative value.

Thanks.

I got the negative loss, when i training autoencoder on image data and normalize the images to 0 mean and 1 std (half of data value is -ve) and using binary_crossentropy loss. Later i figure out, this is happening because of binary_crossentropy loss work as regression loss when the input is between 0 and 1, but in my case inputs are also -ve.
http://neuralnetworksanddeeplearning.com/chap3.html

The answer is easy in my opinion. Your data are not between 0 and 1 and they are between 0 and 255. Just add a "/ 255" on your ground truth data and results will be positive.

Thanks Hamed. You are right

Quite a long time ago I also found this issue, fixed it by changing the optimizer from Adam back to the default RMSprop.

I think it can also be the result of a high learning rate in some cases, the weights might become too large for tensorflow to work properly. Sometimes when I have the loss growing, I try decreasing the learning rate and it works.