nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

export WizardCoder to ONNX

RyanChen1997 opened this issue · comments

I want to convert WizardCoder to ONNX.
According to https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models, I write below code to export:

from optimum.onnxruntime import ORTModelForCausalLM
model_path = "../WizardCoder-15B-V1.0"
onnxModel = ORTModelForCausalLM.from_pretrained(model_path, export=True)

It report error below:

ValueError: Trying to export a gpt_bigcode model, that is a custom or unsupported architecture for the task text-generation, but no custom onnx configuration was passed as
`custom_onnx_configs`. Please refer to https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models for an
example on how to export custom models. For the task text-generation, the Optimum ONNX exporter supports natively the architectures: ['bart', 'blenderbot', 'blenderbot_small',
'bloom', 'codegen', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'marian', 'mbart', 'opt', 'llama', 'pegasus'].

It like need to implement ONNXConfig.
Has anyone implemented that?

So from a bird's eye view, the error prompt that was produced states:

  1. You're using a gpt_bigcode model (which is true)

  2. Its apparently a 'custom' or 'unsupported architecture' for the task of text-generation.

  3. However, you failed to pass a custom configuration for your model (WizardCoder) to be converted to. Remember, WizardCoder is a fine-tuned version of Starcoder (which itself is derived from GPT-2 making it CausalLM/decoder-only, but its also an augmented version of GPT-2, so its not an exact architectural match).

  4. The error msg then instructs you to look for custom_onnx_configs on the site link that you're given (https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models).

So let's take the error message at its word and visit the site link and see if we can't find anything on that page related to instructions on how to export custom models. Our first major hint on where to find this was the custom_onnx_configs shoutout.

So let's visit the page, ctrl+f, type in "custom" and see where we get.

image

Lucky us, we didn't have to look too hard. Even luckier, there is a full-blown example of how to export a custom model with the proper arguments.

I re-posted that below for convenience:

from optimum.exporters.onnx import main_export
from optimum.exporters.onnx.model_configs import WhisperOnnxConfig
from transformers import AutoConfig

from optimum.exporters.onnx.base import ConfigBehavior
from typing import Dict

class CustomWhisperOnnxConfig(WhisperOnnxConfig):

		@property
    def outputs(self) -> Dict[str, Dict[int, str]]:
        common_outputs = super().outputs

        if self._behavior is ConfigBehavior.ENCODER:
            for i in range(self._config.encoder_layers):
                common_outputs[f"encoder_attentions.{i}"] = {0: "batch_size"}
        elif self._behavior is ConfigBehavior.DECODER:
            for i in range(self._config.decoder_layers):
                common_outputs[f"decoder_attentions.{i}"] = {
                    0: "batch_size",
                    2: "decoder_sequence_length",
                    3: "past_decoder_sequence_length + 1"
                }
            for i in range(self._config.decoder_layers):
                common_outputs[f"cross_attentions.{i}"] = {
                    0: "batch_size",
                    2: "decoder_sequence_length",
                    3: "encoder_sequence_length_out"
                }

        return common_outputs

		@property
    def torch_to_onnx_output_map(self):
        if self._behavior is ConfigBehavior.ENCODER:
            # The encoder export uses WhisperEncoder that returns the key "attentions"
            return {"attentions": "encoder_attentions"}
        else:
            return {}

model_id = "openai/whisper-tiny.en"
config = AutoConfig.from_pretrained(model_id)

custom_whisper_onnx_config = CustomWhisperOnnxConfig(
        config=config,
        task="automatic-speech-recognition",
)

encoder_config = custom_whisper_onnx_config.with_behavior("encoder")
decoder_config = custom_whisper_onnx_config.with_behavior("decoder", use_past=False)
decoder_with_past_config = custom_whisper_onnx_config.with_behavior("decoder", use_past=True)

custom_onnx_configs={
    "encoder_model": encoder_config,
    "decoder_model": decoder_config,
    "decoder_with_past_model": decoder_with_past_config,
}

main_export(
    model_id,
    output="custom_whisper_onnx",
    no_post_process=True,
    model_kwargs={"output_attentions": True},
    custom_onnx_configs=custom_onnx_configs
)

Okay, that gives us the basic template but this is for the Whisper model. We want to find something applicable to the gpt_bigcode model.

So let's go back to the top of the file and look at the imports:

from optimum.exporters.onnx import main_export
from optimum.exporters.onnx.model_configs import WhisperOnnxConfig
from transformers import AutoConfig

We can see above that the WhisperOnnxConfig comes from optimum.exporters.onnx.model_configs. So let's see if we can find that somewhere.

We'll head to Google for a classic search. Typing this in the searchbar (verbatim) yields the following for me:

image

Okay, cool let's visit that link on HugginFace (it takes us to a different location than the recommended link in your error message).

Visiting that link, if we check the heading that reads, 'Implementing a Custom ONNX Configuration', we can see the following:

image

The green callout box states, "A good way to implement a custom ONNX configuration is to look at the existing configuration implementations in the optimum/exporters/onnx/model_configs.py file."

Great. This is the exact file that we're looking for (remember we started by hunting down the import statement that reads optimum.exporters.onnx.model_configs) [see how that works? not saying that to be an ass/condescending, just hoping to draw that connection between the import statement and the rest of the code so that you know where to look].

Now we just need to find that repository on GitHub somewhere. It isn't listed in that link above from HuggingFace, so I found the repository link for you here.

As stated the file is located under optimum/exports/onnx, so we just need to click those directories in that order and we'll have arrived at our destination. For the sake of time, you can find the model_configs.py file here.

The docstring above the import notes that the file provides "model specific ONNX configurations". More specifically, however, it gives architectural configs (vs. models). The difference here is that the model = WizardCoder, but the architecture = gpt_bigcode.

We know this because of the little hint given under that green callout box that reads: "When inheriting from a middle-end class, look for the one handling the same modality / category of models as the one you are trying to support."

From this point let's ctrl+f to gpt_bigcode and we'll land at the sample configuration for exporting models of that architecture to onnx.

You'll find the corresponding code starting at line 272- line 308.

I've re-posted the code below for convenience sake:

class GPTBigCodeDummyPastKeyValuesGenerator(DummyPastKeyValuesGenerator):
    def generate(self, input_name: str, framework: str = "pt"):
        past_key_value_shape = (
            self.batch_size,
            self.sequence_length,
            self.hidden_size // self.num_attention_heads * 2,
        )
        return [self.random_float_tensor(past_key_value_shape, framework=framework) for _ in range(self.num_layers)]


class GPTBigCodeOnnxConfig(TextDecoderOnnxConfig):
    DUMMY_INPUT_GENERATOR_CLASSES = (
        GPTBigCodeDummyPastKeyValuesGenerator,
    ) + TextDecoderOnnxConfig.DUMMY_INPUT_GENERATOR_CLASSES
    DUMMY_PKV_GENERATOR_CLASS = GPTBigCodeDummyPastKeyValuesGenerator
    NORMALIZED_CONFIG_CLASS = NormalizedConfigManager.get_normalized_config_class("gpt_bigcode")

    def add_past_key_values(self, inputs_or_outputs: Dict[str, Dict[int, str]], direction: str):
        if direction not in ["inputs", "outputs"]:
            raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')

        if direction == "inputs":
            decoder_sequence_name = "past_sequence_length"
            name = "past_key_values"
        else:
            decoder_sequence_name = "past_sequence_length + 1"
            name = "present"

        for i in range(self._normalized_config.num_layers):
            # No dim for `n_head` when using multi-query attention
            inputs_or_outputs[f"{name}.{i}.key_value"] = {
                0: "batch_size",
                1: decoder_sequence_name,
            }

    def flatten_past_key_values(self, flattened_output, name, idx, t):
        flattened_output[f"{name}.{idx}.key_value"] = t

If you're even more curious beyond there as to what the DUMMY_INPUT_GENERATOR_CLASSES come from, then you can check out this part of the repo here (optimum/docs/source/utils/dummy_input_generators.mdx). Event though the file is in .mdx format, we can still read it in plain English without trouble (if not, copy/paste it in a markdown renderer).

In the file we can see it says: "*It is very common to have to generate dummy inputs to perform a task (tracing, exporting a model to some backend, testing model outputs, etc). The goal of [~optimum.utils.input_generators.DummyInputGenerator] classes is to make this generation easy and re-usable."

Then immediately below it shows that the base class here for what we're looking for is optimum.utils.input_generators.DummyInputGenerator; more specifically the compatible dummy generator for what we need (for gpt_bigcode) = optimum.utils.input_generators.DummyDecoderTextInputGenerator.

So we can find that at optimum/optimum/utils/input_generators.py and in that file there should be a class defined as DummyDecoderTextInputGenerator. Sure enough, it does at line 316.

class DummyDecoderTextInputGenerator(DummyTextInputGenerator):
    """
    Generates dummy decoder text inputs.
    """

    SUPPORTED_INPUT_NAMES = (
        "decoder_input_ids",
        "decoder_attention_mask",
    )

Before moving on, I'm going to redirect you to line 272 in the model_configs.py file, which reads: class GPTBigCodeDummyPastKeyValuesGenerator(DummyPastKeyValuesGenerator).

This means that GPTBigCodeDummyPastKeyValuesGenerator is a subclass of DummyPastKeyValuesGenerator and it inherits all of its attributes and methods. So let's take a look at DummyPastKeyValuesGenerator in the optimum/optimum/utils/input_generators.py file since its also defined there too.

That reads as follows:

class DummyPastKeyValuesGenerator(DummyInputGenerator):
    """
    Generates dummy past_key_values inputs.
    """

    SUPPORTED_INPUT_NAMES = ("past_key_values",)

    def __init__(
        self,
        task: str,
        normalized_config: NormalizedTextConfig,
        batch_size: int = DEFAULT_DUMMY_SHAPES["batch_size"],
        sequence_length: int = DEFAULT_DUMMY_SHAPES["sequence_length"],
        random_batch_size_range: Optional[Tuple[int, int]] = None,
        random_sequence_length_range: Optional[Tuple[int, int]] = None,
        **kwargs,
    ):
        self.num_layers = normalized_config.num_layers
        self.num_attention_heads = normalized_config.num_attention_heads
        self.hidden_size = normalized_config.hidden_size
        if random_batch_size_range:
            low, high = random_batch_size_range
            self.batch_size = random.randint(low, high)
        else:
            self.batch_size = batch_size
        if random_sequence_length_range:
            low, high = random_sequence_length_range
            self.sequence_length = random.randint(low, high)
        else:
            self.sequence_length = sequence_length

    def generate(self, input_name: str, framework: str = "pt"):
        shape = (
            self.batch_size,
            self.num_attention_heads,
            self.sequence_length,
            self.hidden_size // self.num_attention_heads,
        )
        return [
            (
                self.random_float_tensor(shape, framework=framework),
                self.random_float_tensor(shape, framework=framework),
            )
            for _ in range(self.num_layers)
        ]

You likely won't have to fiddle around with any of these values at all, but I just wanted to give a comprehensive understanding for you of:

  1. What the code is looking for in order to successfully export the model.
  2. Finding what its looking for in greater specificity (with the 'Whisper' example)
  3. Finding relevant code that provides an example with a diff architecture (albeit incorrect one.
  4. Finding the code that provides what you need forgpt_bigcode, since that's the architecture of th emodel we're using here for WizardCoder.
  5. Explaining the different classes/subclasses in that code & showing where they're defined so that it all makes sense and/or you know where to go to extract said code and make modifications if you feel they're necessary before exporting your model to onnx format.

Wishing You Luck - Let Me Know How This Goes

I don't have a WizardCoder model ready to be exported on my device right now, so I didn't run any of the relevant code. However, I'm imagining that if you insert the boilerplate given for this specific model's architecture within the scheme of your export script, then it should execute w/o issue. If there is one, I'd be curious to see what that is.

Feel free to reply to this comment if this doesn't work as I'd be interested in chasing this rabbit down until its caught since I will have to export this model to onnx format too in the not so distant future.

Wow, @FoobarProtocol I've never seen a reply as clear as this! You dazzled me.

Hi @FoobarProtocol Thanks for your clear reply.
After reading your reply, i review the code. I can't found the function named GPTBigCodeOnnxConfig in model_configs.py. So i go to optimum github and found that the latest version is 1.11.1, but my optimum version is 1.10.1. Then i update optimum version.
I run below code to convert inference framework to ONNX:

from optimum.exporters.onnx import main_export
main_export(
    model_path,
    output=onnx_path,
    task="text-generation-with-past",
    framework="pt",
    trust_remote_code=True,
    no_post_process=True,
)

It prints so much messages. But finally it seem convert successfully as i can get the ONNX model file in the output path.
Then i want to use the model file follow the guide https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models.

pipe = pipeline("question-answering", model=model, tokenizer=tokenizer, device_map="auto", framework="pt", config=generation_config)

It can use the model file, but it looks like to run the model on CPU not on GPU, and the inference speed is than the origin inference framework which running on GPU.