Support scenarios where inputs_embeds is a model input

Question

Support scenarios where inputs_embeds is a model input

eres313 opened this issue 2 months ago · comments

Hi while converting the model with extra_options = {"exclude_embeds": 1} I am getting below error while loading the model

# Inference
import onnxruntime_genai as og
import time


model = og.Model(output_dir)

RuntimeError: Error encountered while parsing './output_dir/genai_config.json' JSON Error: Unknown value: inputs_embeds at line 20 index 49

eres313 · Answer 1 · Tue May 28 2024 17:01:23 GMT+0800 (China Standard Time)

Here is the genai_config.json file

{
    "model": {
        "bos_token_id": 1,
        "context_length": 4096,
        "decoder": {
            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": [
                    {
                        "cuda": {
                            "enable_cuda_graph": "0"
                        }
                    }
                ]
            },
            "filename": "model.onnx",
            "head_size": 96,
            "hidden_size": 3072,
            "inputs": {
                "inputs_embeds": "inputs_embeds",
                "attention_mask": "attention_mask",
                "position_ids": "position_ids",
                "past_key_names": "past_key_values.%d.key",
                "past_value_names": "past_key_values.%d.value"
            },
            "outputs": {
                "logits": "logits",
                "present_key_names": "present.%d.key",
                "present_value_names": "present.%d.value"
            },
            "num_attention_heads": 32,
            "num_hidden_layers": 32,
            "num_key_value_heads": 32
        },
        "eos_token_id": 32000,
        "pad_token_id": 32000,
        "type": "phi3",
        "vocab_size": 32015
    },
    "search": {
        "diversity_penalty": 0.0,
        "do_sample": false,
        "early_stopping": true,
        "length_penalty": 1.0,
        "max_length": 4096,
        "min_length": 0,
        "no_repeat_ngram_size": 0,
        "num_beams": 1,
        "num_return_sequences": 1,
        "past_present_share_buffer": false,
        "repetition_penalty": 1.0,
        "temperature": 1.0,
        "top_k": 1,
        "top_p": 1.0
    }
}

Baiju Meswani · Answer 2 · Tue May 28 2024 22:31:25 GMT+0800 (China Standard Time)

eres313 currently onnxruntime-genai does not support executing a model with input embeddings as the input to a decoder only model.

Could you share some details around how you plan to retrieve the input embeddings for your use case before they can be passed into onnxruntime-genai as inputs to the model? Do you intend to do this outside of onnxruntime?

Hasanli Orkhan · Answer 3 · Wed May 29 2024 04:29:24 GMT+0800 (China Standard Time)

Yes, my input embeddings are fixed, I dont want to recompute them every time, so it will be outside the onnxruntime.

eres313 · Answer 4 · Wed May 29 2024 20:24:14 GMT+0800 (China Standard Time)

The model is taking input_embeds and attention_mask as inputs to generate the text.

henrywang0314 · Answer 5 · Fri May 31 2024 14:17:57 GMT+0800 (China Standard Time)

also have the same issue while running the phi-3 vision onnx

Baiju Meswani · Answer 6 · Fri May 31 2024 14:41:35 GMT+0800 (China Standard Time)

henrywang0314 running phi-3 vision onnx model should not give you that error. What version of onnxruntime-genai are you using?

Could you share the output of pip list | grep onnxruntime-genai if using linux?

Baiju Meswani · Answer 7 · Fri May 31 2024 14:42:57 GMT+0800 (China Standard Time)

Yes, my input embeddings are fixed, I dont want to recompute them every time, so it will be outside the onnxruntime.

OrkhanHI We can add some work in onnxruntime-genai to support scenarios where the input is inputs_embeds. For now, this is not supported.

Colin Bieberstein · Answer 8 · Wed Jun 05 2024 06:30:54 GMT+0800 (China Standard Time)

Same issue with phi-3-vision when following the instructions here: https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html

pip list | grep onnxruntime-genai
onnxruntime-genai 0.2.0

python phi3v.py -m cpu-int4-rtn-block-32-acc-level-4
Loading model...
Traceback (most recent call last):
File "/home/cbiebers/devel/onnx-phi-3-vision/phi3v.py", line 66, in
run(args)
File "/home/cbiebers/devel/onnx-phi-3-vision/phi3v.py", line 16, in run
model = og.Model(args.model_path)
RuntimeError: Error encountered while parsing 'cpu-int4-rtn-block-32-acc-level-4/genai_config.json' JSON Error: Unknown value: inputs_embeds at line 14 index 49

Baiju Meswani · Answer 9 · Wed Jun 05 2024 08:35:44 GMT+0800 (China Standard Time)

Please use the release candidate 0.3.0-rc2

Use --pre in your pip install command.

pip install --pre onnxruntime-genai