Error while evaluating model

Question

Error while evaluating model

kishorepv opened this issue 3 years ago · comments

Hi,

I tried evaluating using the provided checkpoint. I get the following error:

root@jetson:/nlp/lite-transformer/lite-transformer# configs/wmt14.en-fr/test.sh /data/nlp/embed200/ 0 valid
Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 32, in main
task = tasks.setup_task(args)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/init.py", line 17, in setup_task
return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/translation.py", line 166, in setup_task
args.source_lang, args.target_lang = data_utils.infer_language_pair(paths[0])
File "/nlp/lite-transformer/lite-transformer/fairseq/data/data_utils.py", line 24, in infer_language_pair
for filename in os.listdir(path):
FileNotFoundError: [Errno 2] No such file or directory: 'data/binary/wmt14_en_fr'
Namespace(ignore_case=False, order=4, ref='/data/nlp/embed200//exp/valid_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/data/nlp/embed200//exp/valid_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

Kishore PV Reddy · Answer 1 · Fri Feb 26 2021 03:48:32 GMT+0800 (China Standard Time)

Do we need the contents of data/binary/wmt14_en_fr directory for evaluation?

Zhanghao Wu · Answer 2 · Fri Feb 26 2021 07:36:30 GMT+0800 (China Standard Time)

Hi Kishore,
Thank you for asking! We already included the preprocessed binary file of test dataset zipped in our provided checkpoint tar. You could test the checkpoint on the test dataset, by moving the test* and dict* files to data/binary/wmt14_en_fr (mkdir if you do not have it) and calling the test.sh. If you would like to test the checkpoint on the validation set, please run the configs/wmt14.en-fr/prepare.sh to get the preprocessed valid dataset.

Zhanghao Wu · Answer 3 · Fri Feb 26 2021 07:37:52 GMT+0800 (China Standard Time)

I am closing this issue. If you have any following up questions, please feel free to re-open it.

Shalini · Answer 4 · Thu Apr 08 2021 18:55:15 GMT+0800 (China Standard Time)

I am getting the same issue while testing the model. Even though required test* and dict* files are already in their required place.

Could you(@Michaelvll ) please help me to test the trained checkpoint by resolving the error mentioned in original issue by @kishorepv ?

Zhanghao Wu · Answer 5 · Thu Apr 08 2021 20:52:20 GMT+0800 (China Standard Time)

Hi @tomshalini, could you please provide the command you used for testing?

Shalini · Answer 6 · Thu Apr 08 2021 20:59:01 GMT+0800 (China Standard Time)

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll ,
I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Zhanghao Wu · Answer 7 · Fri Apr 09 2021 08:28:13 GMT+0800 (China Standard Time)

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll ,
I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Could you try to use configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/' 0 test? We will automatically add the checkpoint_best.pt in the test.sh.

Shalini · Answer 8 · Fri Apr 09 2021 12:19:32 GMT+0800 (China Standard Time)

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 106, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

Shalini · Answer 9 · Sun May 09 2021 00:04:18 GMT+0800 (China Standard Time)

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 106, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

@Michaelvll could you please help me in resolving the above issue?