microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Home Page:https://aka.ms/GeneralAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference code for V3 without hugging face.

lalitr994 opened this issue · comments

Anyone tried inference with v3 trained model via git. I am getting RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3 for the image which I passed to the model.
@wolfshow

It is hard to locate the cause of errors and debug without error stack traces. It would be helpful to provide more information about your running command, your inference task, the exact error line, and the full error stack trace.

batchd_cls = process_in_batches(data[start_pos: end_pos],image)
File "form_test_new.py", line 129, in process_in_batches
outputs = model(input_ids=batch_input_ids, bbox=bbox, attention_mask=batch_attention_mask,labels=None,images=image)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 1050, in forward
images=images,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 949, in forward
Wp=Wp,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 643, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 442, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 403, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 331, in forward
attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3
Processing - 8 / 1363
^CTraceback (most recent call last):
File "form_test_new.py", line 161, in
image, size = load_image(img_path)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/data/image_utils.py", line 26, in load_image
image = torch.tensor(img_trans.apply_image(image).copy()).permute(2, 0, 1) # copy to make it writeable
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 297, in
return lambda x: self._apply(x, name)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 291, in _apply
x = getattr(t, meth)(x)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/detectron2/data/transforms/transform.py", line 121, in apply_image
pil_image = Image.fromarray(img)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/PIL/Image.py", line 2945, in fromarray
obj = obj.tobytes()
KeyboardInterrupt
@HYPJUDY

According to the message of

attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3

it seems that one of attention_scores or attention_mask does not take into count the image size, i.e. 197 = 14 x 14 + 1 = 397 - 200.
For example, the error can occur if you input the image and text together into the model so the attention_scores has a size of 397 at dimension 3, but you do not consider the image to extend the size of attention_mask.