Code Doesn't Reproduce Results from the Paper

Question

Code Doesn't Reproduce Results from the Paper

HadiZayer opened this issue 5 years ago · comments

I tried to run the code and enabled the options like it was described in the paper and ran the model with the title encoding (with and without elmo) and the results weren't as good as the ones shown in the paper. The outputs showed word repetition and faulty grammar. What does need to be changed to match the results in the paper?

zheshiyige · Answer 1 · Tue Jun 11 2019 02:40:08 GMT+0800 (China Standard Time)

Hello,

    I am also running this code but have one issue with pytorch 1.1.0, and latest torchtext,  and I have one issue with "RawField object has no attribute of 'is_target'". Did you have this issue? If so, how do you solve it? Thank you!

HadiZayer · Answer 2 · Tue Jun 11 2019 02:54:59 GMT+0800 (China Standard Time)

Hello,

    I am also running this code but have one issue with pytorch 1.1.0, and latest torchtext,  and I have one issue with "RawField object has no attribute of 'is_target'". Did you have this issue? If so, how do you solve it? Thank you!

In lastDataset.py, after an assignment of a field to data.RawField(), you need set the is_target attribute to be false.
For example:
after ds.fields["rawent"] = data.RawField(), add another line with ds.fields["rawent"].is_target = False

zheshiyige · Answer 3 · Tue Jun 11 2019 02:56:02 GMT+0800 (China Standard Time)

Oh, got it! Many thanks for your help!

Qi Zeng · Answer 4 · Fri Jun 14 2019 15:34:14 GMT+0800 (China Standard Time)

In lastDataset.py there is a line ds.getBatch(). But 'dataset' object has no attribute 'getBatch'. How did you solve this?

Chanjuan Li · Answer 5 · Tue Jul 16 2019 21:23:47 GMT+0800 (China Standard Time)

我运行出来也是有重复词和语法问题，请问这是为什么啊

bangann · Answer 6 · Thu Aug 29 2019 02:01:12 GMT+0800 (China Standard Time)

Could anyone please tell me how to test the model after training? Many thanks.

SMinL · Answer 7 · Thu Aug 29 2019 09:03:27 GMT+0800 (China Standard Time)

Hello, after training , I think you can execute "generator.py" with command: CUDA_VISIBLE_DEVICES=6 python generator.py -data=data/preprocessed.test.tsv -save=save_dir/your_saved_model the generated text will be saved in the "outputs" dir for next evalutate. and then you can execute "eval.py" to calculate Bleu score.

…

------------------ 原始邮件 ------------------ 发件人: "Andy5250"<notifications@github.com>; 发送时间: 2019年8月29日(星期四) 凌晨2:01 收件人: "rikdz/GraphWriter"<GraphWriter@noreply.github.com>; 抄送: "1501180796"<1501180796@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [rikdz/GraphWriter] Code Doesn't Reproduce Results from the Paper(#1) Could anyone please tell me how to test the model after training? Many thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

SMinL · Answer 8 · Thu Aug 29 2019 09:06:18 GMT+0800 (China Standard Time)

Hello, after training , I think you can execute "generator.py" with command: CUDA_VISIBLE_DEVICES=6 python generator.py -data=data/preprocessed.test.tsv -save=save_dir/your_saved_model the generated text will be saved in the "outputs" dir for next evalutate. and then you can execute "eval.py" to calculate Bleu score. And by the way, I want to ask some questions. Q1: In pargs.py , what are the meaning of parameters "args.sparse" and "args.plan" in pargs.py? What means when they are True? Q2: In the "preprocess*tsv" , what does the final part of one example mean? For example, in the following example, what does the sequence "2 14 7 22 27 18 -1 15 5 8 18 -1 3 0 10 26 18 -1 1 11 20 21 18 -1 6 13 23 18 -1 18 -1 4 9 19 24 25 18 -1" represent? Constrained minimization technique for topic identification using discriminative training and support vector machines . latent semantic indexing matrix ; discrimina-tive training ; constrained minimization approach ; support vector machines ; banking call routing ; combination strategy ; classification error ; classification accuracy ; classifier accuracy ; switchboard databases ; vector-space model ; lsi matrix ; baseline classifiers ; score separation ; classifiers ; ensemble ; classifier ; accuracy <method> <method> <method> <method> <task> <method> <task> <metric> <metric> <material> <method> <method> <method> <metric> <method> <method> <method> <metric> 17 5 16 ; 1 0 11 ; 2 0 11 ; 2 0 7 ; 1 0 6 ; 16 4 12 ; 17 5 12 ; 3 1 0 ; 14 1 2 this paper describes the <method_2> to combine multiple <method_14> in order to improve <metric_7> . since errors of individual <method_14> in the <method_15> should somehow be uncorrelated to yield higher <metric_7> , we propose a <method_5> where the combined <metric_8> is a function of the correlation between classification errors of the individual <method_14> . to obtain powerful single <method_14> , different techniques are investigated including <method_3> and <method_0> , which is a popular <method_10> . we also investigate <method_1> of the <method_11> on <method_2> . <method_1> minimizes the <task_6> by increasing the <metric_13> of the correct from competing documents . experimental evaluation is carried out on a <task_4> and on <material_9> with a set of 23 and 67 topics respectively . results show that the combined <method_16> we propose outperforms the <metric_17> of individual <method_12> by 44 % . Q3: What does the ''pl" loss in train.py mean? I would be very grateful for your help! Looking for your reply.

…

------------------ 原始邮件 ------------------ 发件人: "Andy5250"<notifications@github.com>; 发送时间: 2019年8月29日(星期四) 凌晨2:01 收件人: "rikdz/GraphWriter"<GraphWriter@noreply.github.com>; 抄送: "1501180796"<1501180796@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [rikdz/GraphWriter] Code Doesn't Reproduce Results from the Paper(#1) Could anyone please tell me how to test the model after training? Many thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Rik Koncel-Kedziorski · Answer 9 · Wed Sep 18 2019 08:39:05 GMT+0800 (China Standard Time)

Regarding "Q1: In pargs.py , what are the meaning of parameters "args.sparse" and "args.plan" in pargs.py? What means when they are True?"
args.plan indicates a plan-and-write style model that I have not finished developing yet. args.sparse is a sparse graph transformer that is also not yet developed. Please leave these flags to false.

Regarding """Q2: In the "preprocess*tsv" , what does the final part of one example mean? For example, in the following example, what does the sequence "2 14 7 22 27 18 -1 15 5 8 18 -1 3 0 10 26 18 -1 1 11 20 21 18 -1 6 13 23 18 -1 18 -1 4 9 19 24 25 18 -1" represent?"""
This is used for the plan-and-write model. Please ignore this part of the input.

Rik Koncel-Kedziorski · Answer 10 · Thu Sep 19 2019 08:13:08 GMT+0800 (China Standard Time)

I have updated the repository and others have that they can reproduce the paper results with the current codebase. Please reopen if problem persists.

Siming Dai · Answer 11 · Thu Nov 14 2019 11:06:48 GMT+0800 (China Standard Time)

Hello. I train the model and generate the train dataset results. It works well. However, when I ran the command "python generator.py -data=preprocessed.test.tsv -save=saved_models/19.vloss-3.570731.lr-0.1 ". It came out the follwing error:

Traceback (most recent call last):
File "generator.py", line 71, in
m.load_state_dict(cpt)
File "/home/desmon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for model:
size mismatch for emb.weight: copying a param with shape torch.Size([11738, 500]) from checkpoint, the shape in current model is torch.Size([1914, 500]).
size mismatch for out.weight: copying a param with shape torch.Size([11738, 1000]) from checkpoint, the shape in current model is torch.Size([1914, 1000]).
size mismatch for out.bias: copying a param with shape torch.Size([11738]) from checkpoint, the shape in current model is torch.Size([1914]).
size mismatch for le.seqenc.lemb.weight: copying a param with shape torch.Size([53343, 500]) from checkpoint, the shape in current model is torch.Size([6173, 500]).

It seems that the model size doesn't fit. However, I didn't change any part of the codes. Can anyone tell me how to fix it?

SMinL · Answer 12 · Thu Nov 14 2019 11:22:32 GMT+0800 (China Standard Time)

hello, in the lastDataset.py, the source code generate the vocab from training data, but when you run generate.py it generate vocab from test data, the vocab size is changed. I think we should change the path when you run generate.py . hahaha 

…

------------------ 原始邮件 ------------------ 发件人: "SiMing Dai"<notifications@github.com>; 发送时间: 2019年11月14日(星期四) 中午11:06 收件人: "rikdz/GraphWriter"<GraphWriter@noreply.github.com>; 抄送: "1501180796"<1501180796@qq.com>; "Comment"<comment@noreply.github.com>; 主题: Re: [rikdz/GraphWriter] Code Doesn't Reproduce Results from the Paper (#1) Hello. I train the model and generate the train dataset results. It works well. However, when I ran the command "python generator.py -data=preprocessed.test.tsv -save=saved_models/19.vloss-3.570731.lr-0.1 ". It came out the follwing error: Traceback (most recent call last): File "generator.py", line 71, in m.load_state_dict(cpt) File "/home/desmon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for model: size mismatch for emb.weight: copying a param with shape torch.Size([11738, 500]) from checkpoint, the shape in current model is torch.Size([1914, 500]). size mismatch for out.weight: copying a param with shape torch.Size([11738, 1000]) from checkpoint, the shape in current model is torch.Size([1914, 1000]). size mismatch for out.bias: copying a param with shape torch.Size([11738]) from checkpoint, the shape in current model is torch.Size([1914]). size mismatch for le.seqenc.lemb.weight: copying a param with shape torch.Size([53343, 500]) from checkpoint, the shape in current model is torch.Size([6173, 500]). It seems that the model size doesn't fit. However, I didn't change any part of the codes. Can anyone tell me how to fix it? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Siming Dai · Answer 13 · Thu Nov 14 2019 11:50:36 GMT+0800 (China Standard Time)

Ok. I know it. Thank you!

Siming Dai · Answer 14 · Thu Nov 14 2019 21:48:15 GMT+0800 (China Standard Time)

I try to reproduce the experiment results of the paper using the original settings in the codebase. But the outputs seems not good. Or has anyone reproduced the results from the paper? Could you tell me the corresponding paramter settings? Thanks!

This is the test set results I ran:
Bleu_1: 22.002345819897354
Bleu_2: 12.539165110352608
Bleu_3: 7.4823137365735235
Bleu_4: 4.562304118176371
METEOR: 8.151402224569598
ROUGE_L: 16.02610836089507

Shengqiang Zhang · Answer 15 · Thu Nov 28 2019 16:37:42 GMT+0800 (China Standard Time)

@DesmonDay

I haved reproduced the experiment result of this paper using the original dataset and settings. And I got same score as the paper. According to my test, you may need to modify generate.py to generate candidate.dat and refrence.dat, and then use eval.py to evaluate BLEU and METEOR

rwzhao · Answer 16 · Mon Apr 20 2020 16:29:04 GMT+0800 (China Standard Time)

Can anyone provide complete commands so that I can reproduce the same result as the paper? Thanks!

anthorax1234 · Answer 17 · Thu Apr 30 2020 05:24:00 GMT+0800 (China Standard Time)

hello, in the lastDataset.py, the source code generate the vocab from training data, but when you run generate.py it generate vocab from test data, the vocab size is changed. I think we should change the path when you run generate.py . hahaha
…
------------------ 原始邮件 ------------------ 发件人: "SiMing Dai"<notifications@github.com>; 发送时间: 2019年11月14日(星期四) 中午11:06 收件人: "rikdz/GraphWriter"<GraphWriter@noreply.github.com>; 抄送: "1501180796"<1501180796@qq.com>; "Comment"<comment@noreply.github.com>; 主题: Re: [rikdz/GraphWriter] Code Doesn't Reproduce Results from the Paper (#1) Hello. I train the model and generate the train dataset results. It works well. However, when I ran the command "python generator.py -data=preprocessed.test.tsv -save=saved_models/19.vloss-3.570731.lr-0.1 ". It came out the follwing error: Traceback (most recent call last): File "generator.py", line 71, in m.load_state_dict(cpt) File "/home/desmon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for model: size mismatch for emb.weight: copying a param with shape torch.Size([11738, 500]) from checkpoint, the shape in current model is torch.Size([1914, 500]). size mismatch for out.weight: copying a param with shape torch.Size([11738, 1000]) from checkpoint, the shape in current model is torch.Size([1914, 1000]). size mismatch for out.bias: copying a param with shape torch.Size([11738]) from checkpoint, the shape in current model is torch.Size([1914]). size mismatch for le.seqenc.lemb.weight: copying a param with shape torch.Size([53343, 500]) from checkpoint, the shape in current model is torch.Size([6173, 500]). It seems that the model size doesn't fit. However, I didn't change any part of the codes. Can anyone tell me how to fix it? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi, can you please clarify this? I have changed the path, but still getting the mismatch error. What path do you mean exactly. Can you give a snippet of the code that you changed to get it to work?