thu-spmi / damd-multiwoz

Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context, AAAI 2020.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DAMD evaluation

hpsun1109 opened this issue · comments

Hi, thanks for releasing the source code.
However, this log got the much higher result than this paper reported. I don't know the difference between these two settings.

INFO:root:[CTR] match: 92.6 success: 79.1 bleu: 19.5

Hi,

The higher result is due to the use of the ground truth belief state for searching the database. The result is comparable to line 7 in Table 2 of our paper (slightly higher since we report the average score of 5 runs in the paper). If you want to reproduce the result of line 12, you need to set "bspn_mode='bspn'" and "use_true_bspn_for_ctr_eval=False".

Thanks for the information.