OpenGenerativeAI / llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Home Page:https://huggingface.co/spaces/junior-labs/llm-colosseum

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report Different models fight on the street

taozhiyuai opened this issue · comments

two models fight 50 rounds, the report is below

WX20240331-102114@2x

Impressive results ! Finally a benchmark where Gemma wins lol

You should have a file called "results.csv", right ? Is it the one you used to compute the win rates ?

yes, the data in the table are all from results.cvs

Impressive results ! Finally a benchmark where Gemma wins lol

You should have a file called "results.csv", right ? Is it the one you used to compute the win rates ?

I try to choose the same size of model parameters, or the same file size of model with same Q level. try to keep similar speed of token generation. big model always fail because of low speed of token generation.

I think Gemma 7b is good enough , it is time to train the model.

You want to do finetuning ?

You want to do finetuning ?

yes, it is interesting.