IMDB example, why we have 1 neuron in the last layer

Question

IMDB example, why we have 1 neuron in the last layer

Kuaranir opened this issue 3 years ago · comments

In case of IMDB example, why did we initialize the last layer with only 1 neuron? Though we have two classes: positive and negative reviews:

model.add(layers.Dense(1, activation='sigmoid'))

pkienle · Answer 1 · Mon Nov 29 2021 01:46:23 GMT+0800 (China Standard Time)

How are the true labels coded in that example you are referring to? I assume your true labels are coded as 0 or 1, not as [1, 0] (class 1) or [0, 1] (class 2), right?

Then essentially you have one class only because a sigmoid output < 0.5 could be deemed as negative prediction, and above 0.5 as positive.

Alejandro · Answer 2 · Mon Nov 29 2021 02:44:12 GMT+0800 (China Standard Time)

I thought we should to set as many neurons in the output layer as classes. At least I heard it on the DL courses...

pkienle · Answer 3 · Mon Nov 29 2021 06:53:24 GMT+0800 (China Standard Time)

That's correct in general, but for a binary classification problem, sigmoid activation and one output layer neuron is sufficient.

Peleke Fritz · Answer 4 · Mon Dec 27 2021 23:00:37 GMT+0800 (China Standard Time)

Hi @Kuaranir , when doing binary classification problems, if you use a sigmoid activation then you use one neurone in the output layer. The reason for this is because with sigmoid activation your network predicts one probability which is the probability of success (probability of class 1). This means the probability of failure can be gotten simply as (1 - prob of success). So a single unit is enough.

If you wanted to predict a probability for each class the you can change the activation function to softmax and use 2 units (neurons). Hope that helps :)

Alejandro · Answer 5 · Tue Dec 28 2021 01:45:14 GMT+0800 (China Standard Time)

@pkienle thanks)

Alejandro · Answer 6 · Thu Dec 30 2021 04:15:56 GMT+0800 (China Standard Time)

Thanks!)👍

…

On Mon, Dec 27, 2021 at 6:22 PM Peleke Fritz ***@***.***> wrote: Hi @Kuaranir <https://github.com/Kuaranir> , when doing binary classification problems, if you use a sigmoid activation the you use one neutron in the output layer. The reason for this is because with sigmoid activation your network predicts one probability which is the probability of success (probability of class 1). This means the probability of failure can be gotten simply as (1 - prob of success). So a single unit is enough. If you wanted to predict a probability for each class the you can change the activation function to softmax and use 2 units (neurons). Hope that helps :) — Reply to this email directly, view it on GitHub <#189 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALSY5BGEKBX5RBDKUEQUVSTUTB5SDANCNFSM5I5HQZMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.*** .com>