microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about the `temperature` config in search options and its unexpected/incorrect behavior

jackylu0124 opened this issue · comments

In the Config reference documentation (https://onnxruntime.ai/docs/genai/reference/config.html), regarding the temperature search option, the doc says:

temperature: The temperature value scales the probability of each token so that probable tokens become more likely while less probable ones become less likely. This value can have a range 0 < temperature ≤ 1. When temperature is equal to 1, it has no effect.

but based on my understanding of the concept of temperature in LLM, shouldn't a higher temperature make the distribution more "flattened" and more even while a lower temperature makes the distribution sharper? In other words, the temperature value should scale the probability of each token so that higher temperature means that the probable tokens become less likely while less probable ones become more likely.

I have also done some experiements on different values of this parameter on the Phi-3-mini-4k-instruct-onnx model with the onnxruntime-genai-directml package, and it seems like the generated results are opposite to what the expected results should be like. In my experiments, using lower temperature value generated more "creative" and varied results while using higher temperature value generated more definitive results, but I think the correct behavior should be opposite: higher temperature value should generate more "creative" and varied results due to its more "flattened" distribution while lower temperature value should generate more definitive results due to its sharper distribution.

Package Version: onnxruntime-genai-directml 0.3.0rc1

References:

  1. https://developer.nvidia.com/blog/how-to-get-better-outputs-from-your-large-language-model/
  2. https://ai.stackexchange.com/questions/32477/what-is-the-temperature-in-the-gpt-models

A quick follow-up to this issue. I would really appreciate any help or insights on this issue!

I think the definition in the documentation might need to be updated.

The code does what you would expect it to Softmax(xi/T) where T is the temperature. You can find the cpu code here.

I'll update the definition to be:

The temperature value scales the scores of each token so that lower temperature values lead to sharper distributions.

@baijumeswani Got it, thank you for linking the code and updating the documentation, I appreciate it! To confirm, the DirectML implementation also has the same logic as the CPU one you linked above right? Could you link the DirectML code for the corresponding logic? Thanks again!

DML uses the same CPU code linked above to perform search and sampling.

I'll close this now. Please feel free to ping for further inquiries. Thanks for raising the issue and helping us enhance the product.

Got it, understood, you are welcome, and thanks a lot for making the updates on the documentation!