baaivision / JudgeLM

An open-sourced LLM judge for evaluating LLM-generated answers.

baaivision/JudgeLM Issues

KeyError: 'scores' in judgelm_preprocess.py
Updated 2 months ago
Can you share theminimal `.jsonl` file examples
Closed 7 months ago1
Is this project still under active deveolopment?
Updated 6 months ago1
Issue running single judgement with references
Updated 7 months ago
How to prompt for single answer without reference?
Updated 7 months ago1
训练结果复现不符合预期
Updated 8 months ago
Leaderboard of JudgLM evaluations
Updated 8 months ago2
How to make JudgeLM Generate Chinese Output?
Updated 8 months ago
关于中文问答的自动评测
Closed 8 months ago2
您好你们在训练33b的时候用了多少资源呀
Closed 8 months ago2
HuggingFace TGI for fast inference and serving
Closed 8 months ago2
Have you done any experiments to prove the performance of the model under ground truth labeling and not GPT4 labeling
Closed 8 months ago2
你好，请教下如何同时给多个答案打分呢？看目前的代码好像还不支持？
Closed 9 months ago3
preprocess
Closed 9 months ago6
hi, how can we use the judge_pair template to judge a single answer.
Closed 9 months ago5
Can you describe a bit more about the training process and data strategy. Load given to Inference in the paper
Closed 9 months ago2