Unable to export Table Transformer
aW3st opened this issue Β· comments
System Info
Output of optimum-cli env:
- `optimum` version: 1.18.0
- `transformers` version: 4.39.1
- Platform: Linux-5.4.247-162.350.amzn2.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.22.0
- PyTorch version (GPU?): 2.2.1+cu121 (cuda availabe: True)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)
Output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 24W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Other relevant packages:
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
onnx 1.16.0
onnxruntime-gpu 1.17.1 # installed for CUDA 12.x using method described here https://onnxruntime.ai/docs/install/#install-onnx-runtime-gpu-cuda-12x
torchvision 0.17.1
Who can help?
@michaelbenayoun, @JingyaHuang
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
optimum-cli export onnx --model microsoft/table-transformer-detection --device cuda ./table-transformer-onnx
Fails with the following logs:
Framework not specified. Using pt to export the model.
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.23k/1.23k [00:00<00:00, 8.37MB/s]
model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 115M/115M [00:00<00:00, 218MB/s]
model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 46.8M/46.8M [00:00<00:00, 369MB/s]
Some weights of the model checkpoint at microsoft/table-transformer-detection were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Automatic task detection to object-detection.
preprocessor_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 273/273 [00:00<00:00, 1.51MB/s]
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update `preprocessor_config.json` to use `image_processor_type` instead of `feature_extractor_type`. This warning will be removed in v4.40.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
/usr/local/lib/python3.10/dist-packages/transformers/models/detr/feature_extraction_detr.py:38: FutureWarning: The class DetrFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use DetrImageProcessor instead.
warnings.warn(
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update `preprocessor_config.json` to use `image_processor_type` instead of `feature_extractor_type`. This warning will be removed in v4.40.
Using framework PyTorch: 2.2.1+cu121
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:565: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (batch_size, 1, target_len, source_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/table_transformer/modeling_table_transformer.py:589: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (batch_size * self.num_heads, target_len, self.head_dim):
2024-03-25 21:22:09.534273898 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-25 21:22:09.547945807 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-25 21:22:09.547967543 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...
Validating ONNX model table-transformer-onnx/model.onnx...
2024-03-25 21:22:12.846353552 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-25 21:22:12.846382785 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-25 21:22:13.775984477 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 1207, in onnx_export_from_model
validate_models_outputs(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 182, in validate_models_outputs
raise exceptions[-1][1]
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 165, in validate_models_outputs
validate_model_outputs(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 233, in validate_model_outputs
raise error
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/onnx.py", line 261, in run
main_export(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/__main__.py", line 351, in main_export
onnx_export_from_model(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 1230, in onnx_export_from_model
raise Exception(
Exception: An error occured during validation, but the model was saved nonetheless at table-transformer-onnx. Detailed error: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:61 Compute MatMul dimension mismatch.
If I omit --device cuda
then it seems to work fine.
Expected behavior
I should be able to export a supported model (TableTransformer, in this case) to ONNX without errors.
Hi @aW3st thank you for the detailed report, this is a bug in ONNX Runtime CUDA EP, same as microsoft/onnxruntime#18692. A workaround is to downgrade to onnxruntime-gpu==1.14.1
. I invite you to open a new issue in ORT repo or post in the linked issue to raise awareness.
I just tested onnxruntime main branch is working for the model (Need build from source, or wait for ORT 1.18 release):
optimum-cli export onnx --model microsoft/table-transformer-detection --device cuda ./table-transformer-onnx
Framework not specified. Using pt to export the model.
Some weights of the model checkpoint at microsoft/table-transformer-detection were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Automatic task detection to object-detection.
preprocessor_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 273/273 [00:00<00:00, 1.18MB/s]
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/detr/feature_extraction_detr.py:38: FutureWarning: The class DetrFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use DetrImageProcessor instead.
warnings.warn(
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Using framework PyTorch: 2.1.2+cu121
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:553: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len):
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:560: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (batch_size, 1, target_len, source_len):
/home/tlwu/anaconda3/envs/sdxl/lib/python3.10/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:584: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (batch_size * self.num_heads, target_len, self.head_dim):
2024-03-26 17:28:00.176136112 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-26 17:28:00.225019563 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-26 17:28:00.225082902 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating models in subprocesses...
Validating ONNX model table-transformer-onnx/model.onnx...
2024-03-26 17:28:06.344328182 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-26 17:28:06.344387673 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
-[β] ONNX model output names match reference model (pred_boxes, logits)
- Validating ONNX Model output "logits":
-[β] (2, 15, 3) matches (2, 15, 3)
-[x] values not close enough, max diff: 0.0033473968505859375 (atol: 1e-05)
- Validating ONNX Model output "pred_boxes":
-[β] (2, 15, 4) matches (2, 15, 4)
-[x] values not close enough, max diff: 0.00010737776756286621 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 0.0033473968505859375
- pred_boxes: max diff = 0.00010737776756286621.
The exported model was saved at: table-transformer-onnx
Awesome!