question about evaluating model

Question

question about evaluating model

youngstudent2 opened this issue 2 years ago · comments

hi, how to evaluate model on test set?
in train.py, test.csv is loaded but not used. In use.py, the demo does not seems to be suitable for batch generating
so I tried to introduce the parameters in use.py into BartModel and call the function eval_model of the class BartModel to eval the test set. the test set is from Hybrid-DeepCom and preprocessed by utils.transformer as #7 mentioned . the model i use is the pretrained model 'NTUYG/ComFormer'
however, the result is

INFO:bart_model:{'eval_loss': 5.0416509765625, 'getListRouge': 0.5193519200892092}

the ROUGE score is about 10.0 lower than that mentioned in the paper
is this method ok ?
thanks for your answer in advance

Guang Yang · Answer 1 · Fri May 06 2022 20:08:27 GMT+0800 (China Standard Time)

Wait a little while. I found that DeepCom's current version of dataset partitioning seems to be inconsistent with the previous version.
I will retrain the model and upload it.

Guang Yang · Answer 2 · Fri May 06 2022 21:50:52 GMT+0800 (China Standard Time)

我直接用中文回复吧。。
我重新下载了最新的DeepCom数据，然后看了一下你issue 7中提出的问题。DeepCom提供的code文件已经做过把数字转成NUM这样的操作了。所以直接用下面的代码进行处理：

def transformer(code):
    code_seq = ' '.join([hump2underline(i) for i in code.split()])
    ast = get_ast(code)
    sbt = get_sbt_structure(ast)
    return code_seq, sbt

然后我再看了下数据集，好像是划分的方式改变了。我这篇论文用的数据集是2020年7月下载的，最新的版本划分方式不一样了。我已经在根据这个最新的数据集重新训练了，等训练结束我会把模型和生成的结果公开。

Guang Yang · Answer 3 · Mon May 09 2022 12:26:50 GMT+0800 (China Standard Time)

训练好了。结果我已经上传了，模型在上传中。结果如下：
Bleu_1: 0.564457
Bleu_2: 0.521086
Bleu_3: 0.488375
Bleu_4: 0.461608
METEOR: 0.411969
ROUGE_L: 0.595989
CIDEr: 4.095886