tmp_weight is not defined

Question

tmp_weight is not defined

sshleifer opened this issue 4 years ago · comments

Sam Shleifer commented 4 years ago

Hi,

In this line, the variable tmp_weight is not defined. How should it be set?

Sam Shleifer · Answer 1 · Wed Nov 25 2020 05:31:24 GMT+0800 (China Standard Time)

Another Q, what torch version did you use?
when I set tmp_weight=1.0 and
run

GPUID=1
TOKEN_NUMBER=4096
UPDATE_FREQUENCE=1
CUDA_VISIBLE_DEVICES=1 fairseq-train \
  $dbin/iwslt14.tokenized.de-en.joined_dict -s de -t en \
  --arch transformer_iwslt_de_en --share-all-embeddings \
  --user-dir radam_fairseq --optimizer radam \
  --clip-norm 0.0 --lr 7e-4 --lr-scheduler inverse_sqrt \
  --warmup-init-lr 1e-7 --warmup-updates 6000 --max-update 100000 \
  --dropout 0.3 --attention-dropout 0.1 --relu-dropout 0.1 \
  --weight-decay 0.0001 --criterion label_smoothed_cross_entropy \
  --label-smoothing 0.1 --save-dir iwslt14deen/iwslt-preln-1111 \
  --init-type adaptive-profiling --max-tokens 4096 \
  --update-freq 1 --seed 1111 \
  --log-format simple --restore-file x.pt \
  --threshold-loss-scale 0.03125 \
  --log-interval 100

I get

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace
 operation: [torch.cuda.FloatTensor [32, 104, 1536]], which is output 0 of AddBackward0, is at version 
2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to 
compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Any advice?

Liyuan Liu · Answer 2 · Wed Nov 25 2020 07:47:50 GMT+0800 (China Standard Time)

I'm so sorry for this bug...

The tmp_weight should be removed during a refactorization (#7) and I just fixed this issue in the current master branch.
As to torch version, I am using 1.5.0.

Sam Shleifer · Answer 3 · Wed Nov 25 2020 23:04:48 GMT+0800 (China Standard Time)

Awesome, thanks! I have IWSLT running it torch 1.6.0.
I was also wondering which files were changed from the initial fairseq besides transformer_layer.py.

If you know which commit/day you copied fairseq that would also be helpful! April 20th, 2020 seems slightly off, but not quite sure.

Liyuan Liu · Answer 4 · Thu Nov 26 2020 00:28:23 GMT+0800 (China Standard Time)

Glad it works!

The performance gain is not significant on IWSLT (due to the small dataset and shallow model).

this commit is the first commit including the fairseq folder, but this fairseq folder is the original implementation of Admin (extracted from my private repo), instead of a direct cloned of the fairseq repo.

As to changes, transformer_layer.py is the only file changed for the method; a few more files are changed to accommodate these changes. I did some checking and list most changed files as follows (may omit something and need some debugging):

generate.py
fairseq/options.py
fairseq/trainer.py
fairseq/models/transformer.py
fairseq/modules/transformer_layer.py
fairseq/tasks/fairseq_task.py

Hope it helps and Happy Thanksgiving : -)