"img should be PIL Image" when fine-tuning on MSR-VTT
bryant1410 opened this issue · comments
I got the following error when trying to run python train.py --config configs/msrvtt_4f_i21k.json
(as in the README):
File "***/base/base_dataset.py", line 107, in __getitem__
imgs = self.transforms(imgs)
File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
img = t(img)
File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 195, in __call__
return F.resize(img, self.size, self.interpolation)
File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 229, in resize
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
TypeError: img should be PIL Image. Got <class 'torch.Tensor'>
(I set up the env as described in the README)
Seems like the frames are obtained as torch tensors but then the transforms need a PIL Image:
frozen-in-time/base/base_dataset.py
Lines 294 to 296 in e6fc946
frozen-in-time/base/base_dataset.py
Lines 279 to 281 in e6fc946
If I add a transforms.ToPILImage()
before (and a transforms.ToTensor()
after) in here
frozen-in-time/data_loader/transforms.py
Lines 18 to 23 in e6fc946
it still doesn't work because it needs an image, not multiple images. It also makes me think that the transforms actually won't work when having multiple PIL images.
Seems like the transforms are the incorrect ones? Or am I missing something?
Wait, I think it's because I ended up with a previous torchvision
version. Lemme see this before checking out this issue.
Confirmed.