princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid tensor shape

peregilk opened this issue · comments

commented

I keep getting the following error at the end of the first epoch:
"RuntimeError: Input tensor at index 1 has invalid shape [22, 44], but expected [22, 46]". This happens on a custom dataset. However, the dataset is thoroughly cleaned and should be valid.

The error happens in: comm.py, line 231

Any idea what might be causing this?

commented

Deleting the last line in the dataset actually fixed the error in my case.

Do you know the reason of this problem? I met the similar mistake, and I don't know how to fix it. The dataset I used was wiki1m_for_simcse.txt

commented

no. I did not dig into this. I started debugging this, and just wanted to see if the shape of the vector changed if the dataset size changed. Would give me a hint about what was wrong. Then it just solved the issue. Seems like the last batch is filled incorrectly. Most likely trivial. I am new to SimCSE. Someone knowing the code could fix this easier.

I have the same issue, deleting the last line does not fix my error. Wondering if anyone has better solutions?

"comm.py", line 235, in gather
return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 3 has invalid shape [32, 32], but expected [32, 34]"

I meet the same issue for my custom dataset. Can anyone give some suggestion?

I found the issue has been solved in #148 . In my case, using only single GPU is working in the unsupervised setting.

I solved by running the scripts with multiple GPU. the issue is that I was running the script for single gpu on 4 gpu cards that cause the shape errors. I

Thanks @haoliutj @Dicer-Zz for answering this! This is likely caused by running the single gpu script on multiple GPUs. Please use the corresponding script for different number of GPUs.