question about evaluating model
youngstudent2 opened this issue · comments
hi, how to evaluate model on test set?
in train.py
, test.csv
is loaded but not used. In use.py
, the demo does not seems to be suitable for batch generating
so I tried to introduce the parameters in use.py
into BartModel and call the function eval_model
of the class BartModel to eval the test set. the test set is from Hybrid-DeepCom and preprocessed by utils.transformer
as #7 mentioned . the model i use is the pretrained model 'NTUYG/ComFormer'
however, the result is
INFO:bart_model:{'eval_loss': 5.0416509765625, 'getListRouge': 0.5193519200892092}
the ROUGE score is about 10.0 lower than that mentioned in the paper
is this method ok ?
thanks for your answer in advance
Wait a little while. I found that DeepCom's current version of dataset partitioning seems to be inconsistent with the previous version.
I will retrain the model and upload it.
我直接用中文回复吧。。
我重新下载了最新的DeepCom数据,然后看了一下你issue 7中提出的问题。DeepCom提供的code文件已经做过把数字转成NUM这样的操作了。所以直接用下面的代码进行处理:
def transformer(code):
code_seq = ' '.join([hump2underline(i) for i in code.split()])
ast = get_ast(code)
sbt = get_sbt_structure(ast)
return code_seq, sbt
然后我再看了下数据集,好像是划分的方式改变了。我这篇论文用的数据集是2020年7月下载的,最新的版本划分方式不一样了。我已经在根据这个最新的数据集重新训练了,等训练结束我会把模型和生成的结果公开。
训练好了。结果我已经上传了,模型在上传中。结果如下:
Bleu_1: 0.564457
Bleu_2: 0.521086
Bleu_3: 0.488375
Bleu_4: 0.461608
METEOR: 0.411969
ROUGE_L: 0.595989
CIDEr: 4.095886