argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

Evaluate using MT-Bench and AlpacaEval to compare with Zephyr 7B Beta

alvarobartt opened this issue 7 months ago · comments

Alvaro Bartolome commented 7 months ago

Description

To be able to actually compare Notus 7B with Zephyr 7B Beta, we will need to run the same benchmarks i.e. MT-Bench and AlpacaEval.

MT-Bench at https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge
AlpacaEval at https://github.com/tatsu-lab/alpaca_eval#quick-start

Following the instructions at https://github.com/huggingface/alignment-handbook/blob/main/scripts/README.md#evaluating-chat-models