Inference code for V3 without hugging face.

Question

Inference code for V3 without hugging face.

lalitr994 opened this issue 2 years ago · comments

Anyone tried inference with v3 trained model via git. I am getting RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3 for the image which I passed to the model.
@wolfshow

Yupan Huang · Answer 1 · Mon Jun 13 2022 21:32:28 GMT+0800 (China Standard Time)

It is hard to locate the cause of errors and debug without error stack traces. It would be helpful to provide more information about your running command, your inference task, the exact error line, and the full error stack trace.

lalitr994 · Answer 2 · Tue Jun 14 2022 13:56:12 GMT+0800 (China Standard Time)

batchd_cls = process_in_batches(data[start_pos: end_pos],image)
File "form_test_new.py", line 129, in process_in_batches
outputs = model(input_ids=batch_input_ids, bbox=bbox, attention_mask=batch_attention_mask,labels=None,images=image)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 1050, in forward
images=images,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 949, in forward
Wp=Wp,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 643, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 442, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 403, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 331, in forward
attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3
Processing - 8 / 1363
^CTraceback (most recent call last):
File "form_test_new.py", line 161, in
image, size = load_image(img_path)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/data/image_utils.py", line 26, in load_image
image = torch.tensor(img_trans.apply_image(image).copy()).permute(2, 0, 1) # copy to make it writeable
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 297, in
return lambda x: self._apply(x, name)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 291, in _apply
x = getattr(t, meth)(x)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/detectron2/data/transforms/transform.py", line 121, in apply_image
pil_image = Image.fromarray(img)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/PIL/Image.py", line 2945, in fromarray
obj = obj.tobytes()
KeyboardInterrupt
@HYPJUDY

Yupan Huang · Answer 3 · Fri Jun 17 2022 14:49:02 GMT+0800 (China Standard Time)

According to the message of

attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3

it seems that one of attention_scores or attention_mask does not take into count the image size, i.e. 197 = 14 x 14 + 1 = 397 - 200.
For example, the error can occur if you input the image and text together into the model so the attention_scores has a size of 397 at dimension 3, but you do not consider the image to extend the size of attention_mask.