Scholar Evals (sevals)

This is built on Eleuther AI's LM Evaluation Harness but has:

A simpler command-line interface
A UI to visualize results and view model outputs (view example results)
The ability to reproduce and publish reproduced results to the original model page.

Installation

pip install sevals

API Keys

Go to usescholar.org/api-keys to get an API Key, then enter it into the sevals CLI when prompted.

Usage

sevals <model> <task> [options]

Examples

# Mock/Dummy model
sevals dummy gsm8k

# Local model
sevals ./path/to/model gsm8k

# HuggingFace model
sevals mistralai/Mistral-7B-v0.1 gsm8k

# OpenAI API
sevals gpt-3.5-turbo gsm8k

# Multiple GPUs
accelerate launch --no-python sevals dummy gsm8k

Tasks

Full list of tasks:

sevals --list_tasks

Documentation

% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list_tasks [search string]] [--list_projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
              [-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
              [model] [tasks]

positional arguments:
  model                 Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
                        E.g.:
                        - HuggingFace Model: mistralai/Mistral-7B-v0.1
                        - OpenAI Model: gpt-3
                        - Local Model: ./path/to/model
  tasks                 To get full list of tasks, use the command sevals --list_tasks

optional arguments:
  -h, --help            show this help message and exit
  --model_args MODEL_ARGS
                        String arguments for model, e.g. 'dtype=float32'
  --gen_kwargs GEN_KWARGS
                        String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
  --list_tasks [search string]
                        List all available tasks, that optionally match a search string, and exit.
  --list_projects       List all projects you have on Scholar, and exit.
  -p PROJECT, --project PROJECT
                        ID of Scholar project to store runs/results in.
  --num_fewshot NUM_FEWSHOT
                        Number of examples in few-shot context
  --batch_size BATCH_SIZE
  -o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
                        The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
  --include_path INCLUDE_PATH
                        Additional path to include if there are external tasks to include.
  --verbose             Whether to print verbose/detailed logs.

manveerxyz / evals

Scholar Evals (sevals)

Installation

API Keys

Usage

Examples

Tasks

Documentation

About

Languages