Questions about Multi-Label and output activation function...

Question

Questions about Multi-Label and output activation function...

JessicaKuo opened this issue 6 years ago · comments

Thanks a lot for offering such good tool for multi-label text classification. It's pretty helpful for my research.
Because I am new to the field of neural network and multi-label, I can't understand so much in some places when using the model(I used CNN model and all parameter settings are default) and I knew the CNN model behind magpie referred from Kim, Yoon. "Convolutional neural networks for sentence classification."

How the CNN model deal with the issue of multi-label classification?Because I didn't found out any description about multi-label classification in Kim, Yoon paper .....I am not sure whether I missed something....

2.What is the output activation function used in CNN model in magpie?
Originally I think is softmax because the output is probability scores and the softmax output is used by Kim, Yoon. "Convolutional neural networks for sentence classification."
But I saw the codes in models.py in magpie:
outputs = Dense(output_length, activation='sigmoid')(flattened)
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['top_k_categorical_accuracy'],
)
And I also read some related topic articles saying that softmax with crossentropy is appropriate for multi-class classification(But if add the threshold it's also can be multi-label) and sigmoid with binary_crossentropy is suitable for multi-label classification

So it makes me confused that the output activation function used in magpie is use softmax or sigmoid?

Thanks for your patient looking!

Jan Stypka · Answer 1 · Fri May 18 2018 02:08:52 GMT+0800 (China Standard Time)

The network described in the paper works fine for multi-label classification. There is no softmax layer at the end of the network, so we can treat the labels independently.
There is no softmax function at the end, it is simply a sigmoid activation function as you noticed. Softmax function guarantees that all label probabilities will sum up to one, which does not make sense for multi-label classification i.e. two labels should be allowed to have probabilities >0.5.

Hope that helps @JessicaKuo !

Jessica10105009 · Answer 2 · Fri May 18 2018 14:18:58 GMT+0800 (China Standard Time)

Ok, now I understand.
Thanks for your explanation!

Prateek Joshi · Answer 3 · Fri Oct 05 2018 09:41:35 GMT+0800 (China Standard Time)

Hi @JessicaKuo

May I know which dataset you are using for multi-label classification?

Jessica10105009 · Answer 4 · Sat Oct 06 2018 17:20:21 GMT+0800 (China Standard Time)

@prateekjoshi565
The dataset I used is SOAP ,prescription and diseases information of outpatients from one hospital.