evals

There are 0 repository under evals topic.

langfuse
langfuse / langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
analytics llm llmops gpt large-language-models openai self-hosted ycombinator monitoring observability open-source langchain llama-index evaluation prompt-engineering prompt-management playground evals llm-evaluation
Language:TypeScript 4829
AgentOps-AI / agentops
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
agent agentops ai anthropic autogen cost-estimation crewai evals evaluation-metrics groq langchain llm mistral ollama openai
Language:Python 1296
nstankov-bg / oaievals-collector
The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!
chatgpt devops docker go openai evals
Language:Go 3
openlayer-ai / templates
Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.
ai evals examples
Language:Python 2
zeus-fyi / mockingbird
Mockingbird Front End Code | Zeus + SciFi = Power of the gods (cloud + ai | Zeus) Meets the power of SciFi (human ingenuity | SfYi) At the intersection of intelligent design (systems engineering excellence) For your intelligence —ZeusFYI.
ai control evals react redux
Language:TypeScript 2
evalica
dustalov / evalica
Evalica, your favourite evaluation toolkit
bradley-terry elo evalica evals evaluation leaderboard library llm pagerank pairwise-comparison pyo3 python ranking rating rust statistics winrate
Language:Python 1
gokayfem / dspy-ollama-colab
dspy with ollama and llamacpp on google colab
agents colab-notebook dspy evals evaluation llamacpp llm ollama vlm
Language:Jupyter Notebook 1
modelmetry / modelmetry-sdk-python
The Modelmetry Python SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.
ai-observability evals large-language-models llm llm-evaluation llmops monitoring observability guardrails openai
Language:Python 1
noah-art3mis / crucible
Develop better LLM apps by testing different models and prompts in bulk.
evals llm ai prompt-engineering
Language:Python 1
camronh / ContextLength-Experiment
Gemini 1.5 Million Token Context Experiment
evals gemini-flash llm
Language:Jupyter Notebook

evals

langfuse / langfuse

AgentOps-AI / agentops

nstankov-bg / oaievals-collector

openlayer-ai / templates

zeus-fyi / mockingbird

dustalov / evalica

gokayfem / dspy-ollama-colab

modelmetry / modelmetry-sdk-python

noah-art3mis / crucible

camronh / ContextLength-Experiment