Question: for dataset

Question

Question: for dataset

zh794390558 opened this issue 6 years ago · comments

dataet
'''
test_data, test_label, valid_data, valid_label, Valid_label, Test_label, pernums_test, pernums_valid = load_data()
'''
请问 Valid_label, Test_label, pernums_test, pernums_valid 这些是用来做什么的？

能给一组参数复现你paper中的结果吗？

evaluation
样本不均衡问题：我看代码里是每个 class 取了 300 个 sentence 吧。
但是我的 eval 中，所有样本倾向预测 class 0 , 请问你知道是什么原因吗？

----------segment metrics---------------
Best valid_UA: 0.2666
Best valid_WA: 0.09396
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[ 34   0   0   0]
 [107   0   0   6]
 [ 70   0   0   2]
 [207   0   0  10]]
----------segment metrics---------------
*****************************************************************
310
Epoch: 310
Valid cost: 1.52
Valid_UA: 0.2666
Valid_WA: 0.09396
Best valid_UA: 0.2666
Best valid_WA: 0.09396
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[ 18   0   0   0]
 [ 67   0   0   6]
 [ 54   0   0   2]
 [141   0   0  10]]
Test_UA: 0.2592
Test_WA: 0.0695
Test Confusion Matrix:["ang","sad","hap","neu"]
[[ 13   0   0   0]
 [ 58   0   0   2]
 [ 50   0   0   0]
 [131   0   0   5]]
*****************************************************************

下面是训练时的打印：

----------segment metrics---------------
valid_UA: 0.4773
valid_WA: 0.3968
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[28  0  6  0]
 [ 3 59 22 29]
 [28 11 18 15]
 [63 34 52 68]]
----------segment metrics---------------
After epoch:9, step: 310, loss on training batch is 0.44, accuracy is 0.900.
train_UA: 0.8941
train_WA: 0.9
Confusion Matrix:["ang","sad","hap","neu"]
[[ 6  0  1  1]
 [ 0 15  0  1]
 [ 0  0  7  0]
 [ 0  0  1  8]]
+ [[ 1 -le 2 ]]

xuanjihe · Answer 1 · Thu Aug 30 2018 13:45:24 GMT+0800 (China Standard Time)

1、Test_label是指每个Test utterance的情感标签，因为可能一句话时间大于3秒，被截成两段了，而pernums_test是指这句Test utterance包含多少个子段。
2、在IEMOCAP中生气是最难区分的情感，在论文里我是将样本数量最少的两种情感上采样了一次，这份代码不能完全复现我的论文，框架是一样的，但是数据处理有变化

Youngdo Ahn · Answer 2 · Thu Oct 24 2019 22:35:47 GMT+0800 (China Standard Time)

Hi, I used Google translation on your answer, "但是数据处理有变化"
Does it mean this open code is not the same as the original paper code?
.. I can't find the technique for the data processing on your paper.
Can you tell us what is the changed processing?