carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Would dspy help the benchmark

davideuler opened this issue · comments

I came across dspy in the early this month. It is super interesting. I wonder if dspy would help to evaluate the llm benchmak?

https://github.com/stanfordnlp/dspy

Maybe we can refactor it by DSPY.

This seems like a very nice project -- but much more general purpose than what I want for this. This thing is designed just for the purpose of this one evaluation which makes it quite a bit easier to build a test than what they've made.