Difficulty on Reproducing the Performance.

Question

Difficulty on Reproducing the Performance.

wangywUST opened this issue 3 years ago · comments

I ran the command
python3 train_bert_gcn.py --dataset R8 --pretrained_bert_ckpt checkpoint/roberta-base_R8/checkpoint.pth -m 0.5
without changing the code.

But the test accuracy is less than 0.8. Is there anything that I missed?

ZeroRin · Answer 1 · Mon Jun 28 2021 22:12:35 GMT+0800 (China Standard Time)

Checked my logs and noticed that for bert+gcn experiments i used batch size 128 and switched to 64 for gat variants, but i don't think that would make big difference in performance.
Does the pretrained roberta-base model matches the reported performance?

Yiwei Wang · Answer 2 · Tue Jun 29 2021 11:14:49 GMT+0800 (China Standard Time)

Thanks for your reply!

I just sequentially run the commands:

python3 build_graph.py R8
python3 finetune_bert.py --dataset R8
python3 train_bert_gcn.py --dataset R8 --pretrained_bert_ckpt checkpoint/roberta-base_R8/checkpoint.pth -m 0.5

with a V100 GPU, without changing any line of the code. But the test accuracy is less than 0.8 in every epoch.

Is your mentioned 'pretrained roberta-base model' the one that I produce with the second command or the one offered by you. If the latter is the case, may I know how to use it? Thank you very much for your time and hope that you can try running the commands listed above.

ZeroRin · Answer 3 · Wed Jun 30 2021 15:15:44 GMT+0800 (China Standard Time)

I mean the one produced by finetune_bert.py. I want to know whether the problem occurs during in the training of bert module or the joint training for bert+gcn. I'll also test it myself.

ZeroRin · Answer 4 · Tue Jul 06 2021 15:47:20 GMT+0800 (China Standard Time)

I noticed that I accidentally removed scheduler.step() in finetune_bert.py when reformatting my code, so that the lr scheduler is not working and the bert module fails to converge under a high learning rate. I have fixed this bug and it should work now.

Yiwei Wang · Answer 5 · Wed Jul 07 2021 21:56:01 GMT+0800 (China Standard Time)

Thanks for checking!

Based on your updated code, I sequentially run the commands:

python3 build_graph.py R8
python3 finetune_bert.py --dataset R8
python3 train_bert_gcn.py --dataset R8 --pretrained_bert_ckpt checkpoint/roberta-base_R8/checkpoint.pth -m 0.5

with a V100 GPU, without changing any line of the code. But the test accuracy is still less than 0.8 in every epoch.

Did you try these commands and see the results? Thanks a lot!

ZeroRin · Answer 6 · Wed Jul 07 2021 22:07:59 GMT+0800 (China Standard Time)

After running train_bert_gcn.py for 1 epoch I got ~0.97, so I thought it should work as expected.
I guess it is still important to know whether the problem occurs in bert training or joint training, can you check the test accuracy of ‘checkpoint/roberta-base_R8/checkpoint.pth’? And maybe you can try running:
python train_bert_gcn.py --dataset R8 -m 0.5
without using pretrained initialization

ZeroRin · Answer 7 · Thu Jul 08 2021 19:28:59 GMT+0800 (China Standard Time)

Checked our finetuned roberta model again and I noticed that it was trained under initial lr 1e-4 instead of 1e-3. According to my experiments today, training roberta with 1e-3 results in a bad model for initialization. I updated the default params and ran everything from the scratch, the result matches our reported performance. Hopefully it would solve the problem

Yiwei Wang · Answer 8 · Fri Jul 09 2021 09:21:33 GMT+0800 (China Standard Time)

Thank you for your time!

python3 train_bert_gcn.py --dataset R8 -m 0.5

gives good performance.