wakamex / human3090

Human Eval scores evaluated locally using a 3090

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unless otherwise specified:

  • maximum number of layers offloaded to GPU
  • parameter changes carried over into following tests (temperature, penalties, etc.)
  • denotes non-local for comparison
Model Configuration Human Eval
GPT-4* Instruction-style, temperature=0.2, presence_penalty=0 63.4%
GPT-4* Completion-style 84.1%
Mixtral8x7b mixtral-8x7b-v0.1.Q5_K_M.gguf 45.7%
Mistral-medium* 62.2%
Llama2* HF API, CodeLlama-34b-Instruct-hf 42.1%
Mistral-large* 73.2%
WizardLM2 WizardLM-2-8x22B.IQ3_XS-00001-of-00005.gguf 56.7%
Wizardcoder wizardcoder-33b-v1.1.Q4_K_M.gguf, temperature=0.0 73.8%
Wizardcoder-Python Q4_K_M. quant, Modified prompt 57.3%
CodeFuse-Deepseek CodeFuse-DeepSeek-33B-Q4_K_M.gguf 68.7%
Deepseek deepseek-coder-33b-instruct.Q4_K_M.gguf 79.9%
OpenCodeInterpreter ggml-opencodeinterpreter-ds-33b-q8_0.gguf, -ngl 40 Failed
Deepseek ggml-deepseek-coder-33b-instruct-q4_k_m.gguf 78.7%
Deepseek deepseek-coder-33b-instruct.Q5_K_M.gguf, -ngl 60, a bit slow 79.3%
Llama3* together.ai API, Llama-3-70b-chat-hf 75.6%
DBRX* together.ai API, dbrx-instruct 48.8%
CodeQwen codeqwen-1_5-7b-chat-q8_0.gguf 83.5%
Llama3-8B bartowski/Meta-Llama-3-8B-Instruct-GGUF 52.4%

About

Human Eval scores evaluated locally using a 3090


Languages

Language:Python 97.5%Language:Shell 2.5%