OpenCompass's repositories
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
CriticBench
A comprehensive benchmark for evaluating critique ability of LLMs
code-evaluator
A multi-language code evaluation tool.
human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
pytorch_sphinx_theme
Sphinx Theme for OpenCompass - Modified from PyTorch