huggingface / optimum

πŸš€ Accelerate training and inference of πŸ€— Transformers and πŸ€— Diffusers with easy to use hardware optimization tools

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Mistral feature extraction export to ONNX is broken

michaelroyzen opened this issue Β· comments

System Info

Optimum 1.17.1
Transformers 4.38.1
PyTorch 2.1.0

Who can help?



  • The official example scripts
  • My own modified scripts


  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

ONNX export for Mistral models is currently broken. Running optimum-cli export onnx --model Salesforce/SFR-Embedding-Mistral onnx/sfr-embedding-mistral using the latest version of Optimum and Transformers will error with a problem about Trilu:

Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:02<00:00,  1.25it/s]
Automatic task detection to feature-extraction (possible synonyms are: default, mask-generation, sentence-similarity).
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.0+cu121
Overriding 1 configuration item(s)
	- use_cache -> False
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/mistral/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/mistral/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/mistral/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
[/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/mistral/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
[/home/ubuntu/.local/lib/python3.8/site-packages/sentence_transformers/models/]( TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert gather_indices.shape == (bs, 1, hidden_dim)
Saving external data to one file...
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/optimum-cli", line 8, in <module>
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/commands/", line 163, in main
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/commands/export/", line 261, in run
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/exporters/onnx/", line 351, in main_export
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/exporters/onnx/", line 1152, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/exporters/onnx/", line 763, in export_models
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/exporters/onnx/", line 897, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/optimum/exporters/onnx/", line 306, in fix_dynamic_axes
    session = InferenceSession(model_path.as_posix(), providers=providers, sess_options=session_options)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/onnxruntime/capi/", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/onnxruntime/capi/", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Trilu(14) node with name '/0/auto_model/Trilu'

Expected behavior

I'd expect the export to finish successfully. Funny enough, it works with Optimum 1.14.1 and Transformers 4.35.2. Not sure what changed since then. It seems that the Model Patcher from Optimum is now gone, which previously fixed a similar issue for Falcon: #1391