voidful / awesome-evaluation-lm

Collection Of Automated Language Model Assessment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

awesome-evaluation-lm

Collection Of Automated Language Model Assessment

Project Name GitHub Link Description
PandaLM PandaLM A universal evaluation framework designed to assess both foundation models and chat models.
MiniCheck MiniCheck A toolkit for evaluating the quality of model outputs, with a focus on assessing factual consistency.
ChatEval ChatEval An evaluation framework designed specifically for chatbots, including both automatic and human evaluation methods.
auto-j auto-j A tool for automatically evaluating the fluency and grammatical correctness of text generated by language models.
LLMBar LLMBar An evaluation framework that includes a large collection of benchmark tests for evaluating the performance of large language models on various tasks.
JudgeLM JudgeLM A platform for evaluating and comparing different language models.
LAMM LAMM A tool that focuses on evaluating the factual consistency of models, providing test datasets and evaluation metrics.
Prometheus Prometheus An evaluation framework that includes test datasets and metrics for evaluating the performance of models on question answering tasks.
PCRM PCRM A Prompt-based Chat Model evaluation method that provides evaluation metrics and code.
TIGERScore TIGERScore An evaluation framework that focuses on assessing the fluency and coherence of text generated by models.

About

Collection Of Automated Language Model Assessment