Evaluate LLM models like llama/alpaca using evaluate library?
Jeffwan opened this issue · comments
Jiaxin Shan commented
Hi team, thanks for open source this awesome tool. I am new to the tool and try to ask some questions on LLM evaluation
- Seems
evaluate
already create some evaluators (Some libs call it tasks I think). Can we use these evaluator for LLM evaluation? - I feel different tasks required different datasets. for LLM evaluation, there're popular datasets like MMLU. I am trying to ask Is there tested paring? for example, for QA, I can use dataset1, dataset2 for metric1, metric2 evaluation etc
- What's the difference between huggingface/evaluate and https://github.com/EleutherAI/lm-evaluation-harness?
蒲黎明 commented
m
Phil Wee commented
same, will it get supported?
currently getting this error:
raise ValueError(
ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.