EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Home Page:https://www.eleuther.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Metrics that require knowledge of input.

ciaranby opened this issue · comments

For generate_until output type tasks, I would like to be able to register metrics that are computed using the input prompt to the LLM as well as as the reference and prediction.

I have a niche use case for this but I think it would be useful more broadly. For example LLM-as-Judge style metrics would require the input prompt also.

Also happy to submit a PR if you can point me in the right direction.