Requirement check missed in the evaluation of the text task

Question

Requirement check missed in the evaluation of the text task

wangruicn opened this issue 5 months ago · comments

When assessing the quality of outputs for the text task, only the coherency score is taken into account.
However, the generated passage may not always adhere to the requirement that the end sentence of each paragraph must end with the 4 given sentences.

tree-of-thought-llm/src/tot/tasks/text.py

 def test_output(self, idx: int, output: str):
        output = output.split('Passage:\n')[-1]
        prompt = score_prompt + output
        score_outputs = gpt(prompt, n=5, model='gpt-4')
        scores = []
        for score_output in score_outputs:
            # print(score_output)
            pattern = r".*coherency score is (\d+).*"
            match = re.match(pattern, score_output, re.DOTALL)
            if match:
                score = int(match.groups()[0])
                scores.append(score)
            else:
                print(f'------------------score no match: {[score_output]}')
        print(scores)
        # print('------------')
        info = {'rs': scores, 'r': sum(scores) / len(scores) if scores else 0}
        return info

Shunyu Yao · Answer 1 · Fri Apr 05 2024 07:47:30 GMT+0800 (China Standard Time)

You are right, the evaluation focuses on coherency, empirically I think most of the times the format is satisfied. Should be easy to add new metrics and adapt ToT to add such self-evaluation.