latent-variable / lm-evaluation-harness-webui-wrapper

Wrapper utilizing the oobabooga / text-generation-webui and EleutherAI / lm-evaluation-harness to evaluate GPTQ version of models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lm-evaluation-harness-webui-wrapper

Have you ever pondered how quantization might affect model performance, or what the trade-off is between quantized methods? Me too. Let's explore together!

This wrapper leverages both oobabooga text-generation-webui and EleutherAI lm-evaluation-harness to evaluate GPTQ version models on various benchmarks including ARC, HellaSwag, MMLU, and TruthfulQA, akin to the Open LLM Leaderboard

Test Results

Original Model Quantized Model ARC: 25-shot, arc-challenge (acc_norm) matching EleutherAI lm-evaluation-harness

Model Name Bits GS ARC Act Order
OpenOrca Platypus2 13B 16-bit NA 62.88% NA
OpenOrca Platypus2 13B 8-bit None 62.88% Yes
OpenOrca Platypus2 13B 4-bit 32 62.28% Yes
OpenOrca Platypus2 13B 4-bit 128 62.62% No

Note the 4bit 32GS model reports lower acc_norm then the 4bit 128GS but higher acc of 58.02% vs 57.59%

Supported OS

Linux & Windows

Linux Instructions

  1. Follow the installation procedure for [oobabooga text-generation-webui](https://github.com/oobabooga text-generation-webui).
  2. Download a model and run it in the webui to ensure it is working. Then, shutdown the webui.
  3. Update activate_env.sh with the path to the webui.
  4. Open a terminal in this directory and run:
chmod +x activate_env.sh
source activate_env.sh
  1. Proceed with the installation of lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
  1. Update the model.json file model path
chmod +x run_script.sh
./run_script.sh

Windows Instructions

  1. Follow the installation procedure for [oobabooga text-generation-webui](https://github.com/oobabooga text-generation-webui).
  2. Download a model and run it in the webui to ensure it is working. Then, shutdown the webui.
  3. Update activate_env.bat with the path to the webui.
  4. Run the 'activate_env.bat'
  5. Proceed with the installation of lm-evaluation-harness big refactor branch
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout big-refactor
pip install -e .
  1. Update the model.json file model path
./run_script.bat

TODO

  • Make the process more user-friendly.
  • Add a file for defining variables.
  • Add results with 3bit models

About

Wrapper utilizing the oobabooga / text-generation-webui and EleutherAI / lm-evaluation-harness to evaluate GPTQ version of models

License:Apache License 2.0


Languages

Language:Python 92.6%Language:Shell 5.0%Language:Batchfile 2.4%