Inferece code demo for WizardCoder
ganler opened this issue · comments
Hi, thanks for the amazing work. I am interested in evaluating WizardCoder-Python-34B-V1.0
on HumanEval+. Just curious if there is a minimal Python/HF code snippet demo for me to reference? Thanks!
Thanks for your great eval-plus project. We conducted an extra evaluation on the HE+. The pass@1 is 64.6 (greedy), higher than ChatGPT (63.4). You can use humaneval_gen_vllm.py to generate the code completions.
pip install vllm # This can acclerate the inference process a lot.
pip install transformers==4.31.0
model="/path/to/your/model"
temp=0.2 # set to 0.0 for greedy decoding
max_len=2048
pred_num=200 # set to 1 for greedy decoding
num_seqs_per_iter=1
output_path=preds/T${temp}_N${pred_num}
mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model
CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
--start_index 0 --end_index 164 --temperature ${temp} \
--num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4
Great! I am able to obtained the raw output (in a dialog fashion). Curious if you can point me to the post-processing script to turn them into actual code? (I guess it is simply s.split("```python")[-1].split("```")[0]
?)
yes. we use a similar method. https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/process_humaneval.py
Perfect, we now obtain the results which looks strong and they are updated at https://evalplus.github.io/leaderboard.html
Thanks for the great work!