Evaluate using MT-Bench and AlpacaEval to compare with Zephyr 7B Beta
alvarobartt opened this issue · comments
Alvaro Bartolome commented
Description
To be able to actually compare Notus 7B with Zephyr 7B Beta, we will need to run the same benchmarks i.e. MT-Bench and AlpacaEval.
MT-Bench at https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge
AlpacaEval at https://github.com/tatsu-lab/alpaca_eval#quick-start
Following the instructions at https://github.com/huggingface/alignment-handbook/blob/main/scripts/README.md#evaluating-chat-models