There are 14 repositories under evaluation topic.
:metal: awesome-semantic-segmentation
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Python package for the evaluation of odometry and SLAM
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
A unified evaluation framework for large language models
An open-source visual programming environment for battle-testing prompts to LLMs.
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Multi-class confusion matrix library in Python
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
High-fidelity performance metrics for generative models in PyTorch
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Expression evaluation in golang
Python implementation of the IOU Tracker
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
A Simple Math and Pseudo C# Expression Evaluator in One C# File. Can also execute small C# like scripts
AutoPrompt: Automatic Prompt Construction for Masked Language Models.
TCExam is a CBA (Computer-Based Assessment) system (e-exam, CBT - Computer Based Testing) for universities, schools and companies, that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams.
A collection of datasets that pair questions with SQL queries.
recommender system library for the CLR (.NET)