awslabs / mlm-scoring

Python library & examples for Masked Language Model Scoring (ACL 2020)

Home Page:https://www.aclweb.org/anthology/2020.acl-main.240/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IndexError: too many indices for tensor of dimension 1

mfelice opened this issue · comments

Hi there,

I'm using the PyTorch implementation with bert-base-uncased and I get the following error when the sentence contains only one token:

Traceback (most recent call last):
  File "bert.py", line 28, in <module>
    print(scorer.score_sentences(["Hello"]))
  File ".../mlm-scoring/src/mlm/scorers.py", line 167, in score_sentences
    return self.score(corpus, **kwargs)[0]
  File ".../mlm-scoring/src/mlm/scorers.py", line 757, in score
    out = out[list(range(split_size)), token_masked_ids]
IndexError: too many indices for tensor of dimension 1

It works fine with MXNet MLMs, but I need to use a community model from HuggingFace.

Thanks!

OK, I think I found the problem.

out = out[0].squeeze()

should be changed to:

out = torch.reshape(out[0], (out[0].shape[0], -1))

squeeze() was removing a dimension that should be preserved.

Hurray for publicly licensed software and donation of labour to the public good!