Unable to export Table Transformer

Question

Unable to export Table Transformer

aW3st opened this issue 2 months ago · comments

System Info

Output of optimum-cli env:

- `optimum` version: 1.18.0
- `transformers` version: 4.39.1
- Platform: Linux-5.4.247-162.350.amzn2.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.22.0
- PyTorch version (GPU?): 2.2.1+cu121 (cuda availabe: True)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)

Output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   29C    P0    24W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Other relevant packages:
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.4.99
nvidia-nvtx-cu12         12.1.105
onnx                     1.16.0
onnxruntime-gpu          1.17.1 # installed for CUDA 12.x using method described here https://onnxruntime.ai/docs/install/#install-onnx-runtime-gpu-cuda-12x
torchvision              0.17.1

Who can help?

@michaelbenayoun, @JingyaHuang

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

optimum-cli export onnx --model microsoft/table-transformer-detection --device cuda ./table-transformer-onnx

Fails with the following logs:

Framework not specified. Using pt to export the model.
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.23k/1.23k [00:00<00:00, 8.37MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115M/115M [00:00<00:00, 218MB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46.8M/46.8M [00:00<00:00, 369MB/s]
Some weights of the model checkpoint at microsoft/table-transformer-detection were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Automatic task detection to object-detection.
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 273/273 [00:00<00:00, 1.51MB/s]
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update `preprocessor_config.json` to use `image_processor_type` instead of `feature_extractor_type`. This warning will be removed in v4.40.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
/usr/local/lib/python3.10/dist-packages/transformers/models/detr/feature_extraction_detr.py:38: FutureWarning: The class DetrFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use DetrImageProcessor instead.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update `preprocessor_config.json` to use `image_processor_type` instead of `feature_extractor_type`. This warning will be removed in v4.40.
Using framework PyTorch: 2.2.1+cu121
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:565: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (batch_size, 1, target_len, source_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:589: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (batch_size * self.num_heads, target_len, self.head_dim):
2024-03-25 21:22:09.534273898 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-25 21:22:09.547945807 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-25 21:22:09.547967543 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...
Validating ONNX model table-transformer-onnx/model.onnx...
2024-03-25 21:22:12.846353552 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-25 21:22:12.846382785 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-25 21:22:13.775984477 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 1207, in onnx_export_from_model
    validate_models_outputs(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 182, in validate_models_outputs
    raise exceptions[-1][1]
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 165, in validate_models_outputs
    validate_model_outputs(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 233, in validate_model_outputs
    raise error
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/onnx.py", line 261, in run
    main_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/__main__.py", line 351, in main_export
    onnx_export_from_model(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 1230, in onnx_export_from_model
    raise Exception(
Exception: An error occured during validation, but the model was saved nonetheless at table-transformer-onnx. Detailed error: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch.

If I omit --device cuda then it seems to work fine.

Expected behavior

I should be able to export a supported model (TableTransformer, in this case) to ONNX without errors.

fxmarty commented 2 months ago

Awesome!

fxmarty · Answer 1 · Tue Mar 26 2024 17:03:18 GMT+0800 (China Standard Time)

Hi @aW3st thank you for the detailed report, this is a bug in ONNX Runtime CUDA EP, same as microsoft/onnxruntime#18692. A workaround is to downgrade to onnxruntime-gpu==1.14.1. I invite you to open a new issue in ORT repo or post in the linked issue to raise awareness.

Mohit Sharma · Answer 2 · Tue Mar 26 2024 18:36:00 GMT+0800 (China Standard Time)

Alternatively, the export / inference with ORT> 1.14 seems to work if we disable all ORT optimisations.

@fxmarty I guess for now it would be better to disable the optimisation during validation? #1775

Tianlei Wu · Answer 3 · Wed Mar 27 2024 02:04:43 GMT+0800 (China Standard Time)

I just tested onnxruntime main branch is working for the model (Need build from source, or wait for ORT 1.18 release):

optimum-cli export onnx --model microsoft/table-transformer-detection --device cuda ./table-transformer-onnx

Framework not specified. Using pt to export the model.
Some weights of the model checkpoint at microsoft/table-transformer-detection were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Automatic task detection to object-detection.
preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 273/273 [00:00<00:00, 1.18MB/s]
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/detr/feature_extraction_detr.py:38: FutureWarning: The class DetrFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use DetrImageProcessor instead.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Using framework PyTorch: 2.1.2+cu121
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:553: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len):
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:560: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (batch_size, 1, target_len, source_len):
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:584: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (batch_size * self.num_heads, target_len, self.head_dim):
2024-03-26 17:28:00.176136112 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-26 17:28:00.225019563 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-26 17:28:00.225082902 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating models in subprocesses...
Validating ONNX model table-transformer-onnx/model.onnx...
2024-03-26 17:28:06.344328182 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-26 17:28:06.344387673 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (pred_boxes, logits)
        - Validating ONNX Model output "logits":
                -[✓] (2, 15, 3) matches (2, 15, 3)
                -[x] values not close enough, max diff: 0.0033473968505859375 (atol: 1e-05)
        - Validating ONNX Model output "pred_boxes":
                -[✓] (2, 15, 4) matches (2, 15, 4)
                -[x] values not close enough, max diff: 0.00010737776756286621 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 0.0033473968505859375
- pred_boxes: max diff = 0.00010737776756286621.
 The exported model was saved at: table-transformer-onnx