AIPHES / emnlp19-moverscore

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting same results for n_gram=1,2,3 ?

priyamtejaswin opened this issue · comments

Hi folks,

Thanks for releasing the code, and for making API easy to use.
Changing the n_grams does not seem to change the scores -- I'm wondering if I'm doing something wrong.

I'm using the code provided in the README:

from moverscore_v2 import get_idf_dict, word_mover_score 
from collections import defaultdict

idf_dict_hyp = get_idf_dict(translations) # idf_dict_hyp = defaultdict(lambda: 1.)
idf_dict_ref = get_idf_dict(references) # idf_dict_ref = defaultdict(lambda: 1.)

scores = word_mover_score(references, translations, idf_dict_ref, idf_dict_hyp, \
                          stop_words=[], n_gram=1, remove_subwords=True)

I get the same scores for 1, 2, and 3 as n_gram values.
My dataset is the Gigawords summarization Dev set:

  • 189K samples
  • The "references" are the gold/target summaries
  • The "translations" are the model generated summaries

Thanks a lot for your interest. In the moverscore_v2.py, n-gram matching and p-means are ignored by design for speed and simplicity. The full version is in moverscore.py, but it costs longer time to run.