Jeremy D's repositories
NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Language:PythonNOASSERTION000
Language:Python000
Language:Jupyter Notebook000
examples
Fast and flexible reference benchmarks
Language:PythonApache-2.0000
lm-evaluation-harness-composergpt-integration
A framework for few-shot evaluation of autoregressive language models.
Language:PythonMIT000