minzwon / sota-music-tagging-models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only one class present in y_true. ROC AUC score is not defined in that case.

ppjjee opened this issue · comments

Hi,
The model runs, but an error occurs when calculating roc-auc. It says that there is only one label for y_true, and I don't know what the problem is.
I ran the fcn and musicnn models with the jamendo dataset(autotagging-moodtheme), and both gave the same error.
The error message is as below.
(I configured the same environment according to the requirements.)
Could you help me to solve this problem?


Traceback (most recent call last):
File "main.py", line 59, in
main(config)
File "main.py", line 37, in main
solver.train()
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 182, in train
best_metric = self.validation(best_metric, epoch)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 258, in validation
roc_auc, pr_auc, loss = self.get_validation_score(epoch)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 316, in get_validation_score
roc_auc, pr_auc = self.get_auc(est_array, gt_array)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 244, in get_auc
roc_aucs = metrics.roc_auc_score(gt_array, est_array, average=None)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 375, in roc_auc_score
sample_weight=sample_weight)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_base.py", line 120, in _average_binary_score
sample_weight=score_weight)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 221, in _binary_roc_auc_score
raise ValueError("Only one class present in y_true. ROC AUC score "
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

Hi,
This error occurs when y_true only includes a single class in it. For example, when y_true = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 1, 0]] for labels ['rock', 'jazz', 'metal'], metal is always zero (the last index). That means, only one class (0 in this case) present for the class 'metal'.

Go to line 315 in solver.py and add these two lines.
print(gt_array.shape)
print(gt_array.sum(axis=0))
And see if any index is zero.

By the way, when did this error occur? Was it after the first training epoch?

Thank you for your answer!
and yes, I got this error after the first training epoch.

The result of checking the shape and sum of ground truth is as follows.

(3802, 59)
[ 0 78 147 0 32 58 104 119 74 70 34 118 231 69 123 90 34 207 357 233 167 26 187 187 23 26 74 392 33 35 36 0 136 212 161 111 0 368 75 113 65 84 64 47 278 41 118 147 37 53 128 36 93 38 91 71 49 132 139]

I can see that there are a few indexes that are zero.
Now I see why the error occurred, but I still don't know how to fix it. Can you give me some advice on how to solve the problem?

Many thanks!

I see. So you are not using the data loader provided by this repo (top50 tags of MTG-Jamendo), right? It looks like you are using mood/theme subset of the MTG-Jamendo dataset. In this case, you need to discard the labels that are always negative. At least, you need to discard them when you calculate the ROC-AUC score.

pos_counter = gt_array.sum(axis=0) # count appearance
zero_indice = np.where(pos_counter == 0)[0] # find indice of all-zero tags
gt_array = np.delete(gt_array, zero_indice, 1) # remove all-zero tag columns from ground truth
est_array = np.delete(est_array, zero_indice, 1) # remove all-zero tag columns from prediction
roc_auc, pr_auc = self.get_auc(est_array, gt_array)

This will work although the AUC scores do not consider the tags that are always negative.

Thanks for your big help! Now I can get roc_auc score without any errors.

I'm using the mood/theme subset of the MTG-Jamendo dataset, and now all issues are solved. Those code you provided works perfectly!

Thank you very much!

Awsome! Enjoy your research!

Best,
Minz