stunningpixels / lou-eval

Track the progress of LLM context utilisation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lou-eval

This exists to track the progress of LLM context utilisation

How it works

Preparing the data

  • We generate sentences in the format My name is ${randomName} and I am from ${randomCountry} and I have a pet ${randomAnimal}.
  • For 10 of the sentences we change ${randomAnimal} from an animal to a fruit
  • We then randomise the order of the sentences

Querying the model

We append the following query to the end of the data part of the prompt:

Who has a pet fruit instead of a pet animal? Respond with a JSON list all the instances you find in the format: [{"name":"NAME OF PERSON","fruit":"NAME OF FRUIT"},...]. DO NOT INCLUDE ANY ANIMALS.

Parsing the data

We then count how many successful matches and false positives were returned.

Notes

So that the eval does a good job of modelling real world RAG challenges:

  • Answers come from many places in the doc
  • Answers do not have exact keyword overlap with query
  • Similar/misleading information is included in the prompt
  • We sample 10X at each token length, not enough to be statistically significant, but estimate 土 1 accuracy which is good enough and doesn't break the bank

How to use

  • Add your keys in .env
  • Run npm run start
  • Choose a model in the CLI
  • The model will be automatically evaluated, taking 10 samples at each context length starting at 1k tokens and doubling up to the maximum context length supported by the model
  • Data is saved in the /data directory

Future plans

  • Evaluate other models (Claude, Open Source...)

About

Track the progress of LLM context utilisation


Languages

Language:JavaScript 100.0%