About DialogueRNN performance Prob.
Columbine21 opened this issue · comments
There are some difference in the results because the training strategy is somewhat different. The earlier DialogueRNN model was trained in a two-stage setup which generally gives slightly better results. For the new models in this repo, we have implemented all models end-to-end to make the training and all the different kinds of evaluation strategies more flexible.
We observed some variance in the results for the GloVe models in the end-to-end setup. We write about these observations in Section 5, page 12 of our paper. For this reason, we ran each model more than 20 times and average their results in Table 3 and 4.
Thanks a lot.By the way, what about the earlier DialogueRNN model (trained in two-stage setup), would you like share the W-avg F1 score on IEMOCAP dataset?
For the earlier two-stage DialogueRNN model, the W-Avg F1 score is 62.75 in IEMOCAP.
Thanks for your reply! I read your survey. It really helps.