leehanchung / frag

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FRAG - Framework for Retrieval Augmented Generation Evaluation and Benchmarking

Introducing FRAG - Framework for Retrieval Augmented Generation evaluating and benchmarking.

Current common LLM evaluation suites are based evaluated based on tasks that are used as proxy for intelligence and based on styles. For example, Grade School Math (GSM8k), Massive Multitask Language Understanding (MMLU, k=5). Or for styles, for examples, LMSys that uses LLM as a proxy for human evaluation. While they might proxy LLM's intelligence, they do not evaluate LLM's capability for production use cases.

For production basis, we would want to evaluate based on their capabilities of their production use cases, specifically, hallucinations, context utilization, instruction following, tool dependents potentials used by LLM applications with production usage in mind.

Benchmarking LLM API Endpoints

We are using httpx.

Evaluating Factuality Benchmarking online serving throughput for LLM API endpoints.

  1. Context Utilization
  1. Hallucination
  1. Instruction following
  1. Table understanding
  1. Tool usage

References

About

License:Apache License 2.0


Languages

Language:Python 100.0%