confident-ai / deepeval

The LLM Evaluation Framework

Home Page:https://docs.confident-ai.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

.generate_raw_response(prompt) gives an attribute error

rtzy7 opened this issue · comments

Hi! I have noticed that G-Eval uses .generate_raw_response for its calculations. I wanted to understand the inner workings of the metric and thus tried to dive deeper into the code.

I have initialized the env variables and have gotten "🙌 Congratulations! You're now using Azure OpenAI for all evals that require an LLM."

from deepeval.models import GPTModel

model = GPTModel()

model.generate_raw_response(prompt)```

The prompt: "'Given the evaluation steps, return a JSON with two keys: 1) a `score` key ranging from 0 - 10, with 10 being that it follows the criteria outlined in the steps and 0 being that it does not, and 2) a `reason` key, a reason for the given score, but DO NOT QUOTE THE SCORE in your reason. Please mention specific information from Input and Actual Output in your reason, but be very concise with it!\n\nEvaluation Steps:\n1. Check whether the facts in \'actual output\' contradicts any facts in \'expected output\'\n2. You should also heavily penalize omission of detail\n3. Vague language, or contradicting OPINIONS, are OK\n\n\nInput:\nThe dog chased the cat up the tree, who ran up the tree? \n\nActual Output:\nIt depends, some might consider the cat, while others might argue the dog. \n\n\n\n**\nIMPORTANT: Please make sure to only return in JSON format, with the "score" and "reason" key. No words or explanation is needed.\n\nExample JSON:\n{\n    "score": 0,\n    "reason": "The text does not follow the evaluation steps provided."\n}\n**\n\nJSON:\n'"

Am I giving the method the wrong arguments? Any help would be much appreciated, thank you!

Whoops! I'm assuming this answers my question.

image

@rtzy7 Interesting, if you use it within GEval we gracefully take care of the Attribute Error which is why we raised it. But when you call it as a standalone you'll get the error :)

@penguine-ip is there an end-to-end example for how to use GEval with Azure?

In case anyone finds it useful, GEval does not currently work as expected with Azure on API versions 2024-02-01 and later (don't know about lower versions).

  1. API version 2024-02-01: does not support logprobs at all so they are not used in the calculation. The scores come out with one decimal point (eg. 0.2, 0.7, etc).
  2. API versions 2024-03-01-preview, 2024-04-01-preview, 2024-05-01-preview: support a maximum of 5 top_logprobs. The value 20 is hardcoded here so I'm getting "Invalid value for 'top_logprobs': must be less than or equal to 5." error. Changing 20 to 5 in the source code seems to work, which is the workaround I'm currently using.

@petrgazarov Thanks!
If I am understanding it correctly, the template here grades the response from 0 to 10, which then gets scaled here. I'm guessing this explains the one decimal point values!

Also, for the workaround that you mentioned, should one also change the template to grade the response on a scale of 1-5 instead of a scale of 0-10? This is so that the weighted summed score can be generated as expected, given that the max value for the parameter top_logprobs is 5.

I'm pretty sure that the scale in the template has nothing to do with logprobs. Passing 5 to top_logprobs would return the top 5 log probs instead of top 20 (like in the example here).

Getting the following error when initializing my fluency_metric with the example Azure model from the docs into GEval as the model parameter. Not sure why it is asking for the Open AI API Key. Any ideas here?

Error:

Traceback (most recent call last):
  File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/eval_test.py", line 7, in <module>
    from summarization_eval_strategy import SummarizationStrategy
  File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/summarization_eval_strategy.py", line 10, in <module>
    from fluency_metric import fluency_metric
  File "/Users/cthomps3/Documents/git/hmh/genai-platform-core-1/genai_core_component_library/evaluator/genai_evaluator/fluency_metric.py", line 10, in <module>
    fluency_metric = GEval(
                     ^^^^^^
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/metrics/g_eval/g_eval.py", line 106, in __init__
    self.model, self.using_native_model = initialize_model(model)
                                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/metrics/utils.py", line 86, in initialize_model
    return GPTModel(model=model), True
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 61, in __init__
    super().__init__(model_name)
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/base_model.py", line 35, in __init__
    self.model = self.load_model(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 96, in load_model
    return ChatOpenAI(
           ^^^^^^^^^^^
  File "/Users/cthomps3/Library/Caches/pypoetry/virtualenvs/genai-evaluator-FNF4B49N-py3.12/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 2 validation errors for ChatOpenAI
model
  none is not an allowed value (type=type_error.none.not_allowed)
__root__
  Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)

Custom Model:

class AzureOpenAI(DeepEvalBaseLLM):
    """Custom Azure OpenAI Model for evaluation."""

    def __init__(self, model):
        if model is None:
            raise ValueError("Model cannot be None")
        self.model = model

    def load_model(self):
        """Load the Azure OpenAI model."""
        try:
            return self.model
        except Exception as e:
            print(f"An error occurred while loading the model: {e}")
            return None

    def generate(self, prompt: str) -> str:
        """
        Generate output synchronously using the Azure OpenAI model.

        Parameters
        ----------
        prompt : str
            The prompt to generate output from.

        Returns
        -------
        str
            The generated output.
        """

        if not isinstance(prompt, str):
            raise ValueError("Prompt must be a string")
        try:
            chat_model = self.load_model()
            return chat_model.invoke(prompt).content
        except Exception as e:
            print(f"An error occurred while generating output: {e}")
            return None

    async def a_generate(self, prompt: str) -> str:
        """
        Generate output asynchronously using the Azure OpenAI model.

        Parameters
        ----------
        prompt : str
            The prompt to generate output from.

        Returns
        -------
        str
            The generated output.
        """
        if not isinstance(prompt, str):
            raise ValueError("Prompt must be a string")
        try:
            chat_model = self.load_model()
            res = await chat_model.ainvoke(prompt)
            return res.content
        except Exception as e:
            print(f"An error occurred while generating output asynchronously: {e}")
            return None

    def get_model_name(self):
        """
        Get the name of the Azure OpenAI model deployemnt.

        Returns
        -------
        str
            The name of the Azure OpenAI model deployment.
        """
        return self.model.deployment_name

custom_model = AzureChatOpenAI(
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_deployment=os.getenv("AZURE_DEPLOYMENT_NAME"),
    azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"),
    openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)

Custom Metric

from eval_models import custom_model

fluency_metric = GEval(
    name="Fluency",
    criteria="Fluency measures the quality of individual sentences in the answer, and whether they are well-written and grammatically correct. Consider the quality of individual sentences when evaluating fluency.",
    model=custom_model,
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT]
)