OpenGenerativeAI / llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Home Page:https://huggingface.co/spaces/junior-labs/llm-colosseum

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

suggestion: add blood in log

taozhiyuai opened this issue · comments

I suggest to add play1 blood and player2 blood in the logs when the game is over.

because I notice at the beginning player1 (model is Gemma 7b) may win with tiny blood left, but with the game going on, it wins will more blood. it seems it is learning how to improve fighting.