OpenGenerativeAI / llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Home Page:https://huggingface.co/spaces/junior-labs/llm-colosseum

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ELO ranking score?

Tokkiu opened this issue · comments

截屏2024-04-10 11 34 05

How to generate this ranking? If I added new model, how to reproduce this benchmark?

我的新模型已经在这个 PR 中实现。https://github.com/OpenGenerativeAI/llm-colosseum/pull/45/files您可以在这里观看我的模型与 Mistral 的视频。 https://github.com/Tokkiu/llm-colosseum?tab=readme-ov-file#1-vs-1-mistral-vs-solar

你好璟琦,我对这个项目也非常感兴趣,可以交流吗?

I just launch 50 rounds for two models. the result shows who is a better models. at the moment, Gemma 7B is the best. v1.1 is worse.