OpenCompass's repositories
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
CompassJudger
The All-in-one Judge Models introduced by Opencompass
MMBench-GUI
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
CompassVerifier
[EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
CriticEval
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
Creation-MMBench
Assessing Context-Aware Creative Intelligence in MLLMs
CompassBench
Demo data of CompassBench
human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
hinode
A clean documentation and blog theme for your Hugo site based on Bootstrap 5