babelcloud / LLM-RGB

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.

babelcloud/LLM-RGB Issues

关于麻将提示词的错误
Closed 8 months ago2
Integrate with LiteLLM - Evaluate 100+LLMs, 92% faster
Closed 10 months ago2
Failures: 271
Closed 10 months ago7