Question about imbalanced classes / labels in sentihood dataset

Question

Question about imbalanced classes / labels in sentihood dataset

frankaging opened this issue 4 years ago · comments

Hi, thanks for making your codebase public. I walked through your preprocessing steps, and modeling part. And I could not find a place that you are weighting different classes or labels. As in Sentihood dataset, there is far more "none" cases for sentiment polarity. Did you weight "none" less? Similarity for binary models, there is far more "yes" in this case, as there are many "yes" cases for "none".

As a followup to the binary models, because there are many "yes" cases for "none", the positive score for "none" is biased, and can be always larger than other two. Did this happen when you train your model? The model will just output "none"? Thanks.

Chi Sun · Answer 1 · Thu Jul 16 2020 10:51:04 GMT+0800 (China Standard Time)

Yes, the constructed labels are imbalanced. BERT's generalization ability is strong, and it can learn key information. The model will not just output "none". However, when the number of aspects is larger, we may reduce the number of "none" cases when constructing data.