OpenCompass (open-compass)

OpenCompass

open-compass

Geek Repo

Location:China

Home Page:opencompass.org.cn

Twitter:@OpenMMLab

Github PK Tool:Github PK Tool

OpenCompass's repositories

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:2899Issues:21Issues:369

MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

Language:PythonLicense:Apache-2.0Stargazers:760Issues:9Issues:16

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:513Issues:8Issues:69

LawBench

Benchmarking Legal Knowledge of Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:190Issues:9Issues:11

T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Language:PythonLicense:Apache-2.0Stargazers:168Issues:2Issues:45

BotChat

Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:104Issues:2Issues:1

MMBench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

DevBench

A Comprehensive Benchmark for Software Development.

Language:PythonLicense:Apache-2.0Stargazers:73Issues:4Issues:2

MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

Ada-LEval

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

CriticBench

A comprehensive benchmark for evaluating critique ability of LLMs

Language:PythonLicense:Apache-2.0Stargazers:21Issues:3Issues:0

code-evaluator

A multi-language code evaluation tool.

Language:PythonLicense:Apache-2.0Stargazers:16Issues:3Issues:0
Stargazers:1Issues:0Issues:0

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Language:PythonLicense:MITStargazers:1Issues:1Issues:0
Stargazers:0Issues:3Issues:0

evalplus

EvalPlus for rigourous evaluation of LLM-synthesized code

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

pytorch_sphinx_theme

Sphinx Theme for OpenCompass - Modified from PyTorch

Language:CSSLicense:MITStargazers:0Issues:1Issues:0