LMSYS's repositories
arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
vicuna-blog-eval
The code and data for the GPT-4 based benchmark in the vicuna blog post