allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

https://huggingface.co/spaces/allenai/reward-bench

allenai/reward-bench Issues

Add new Reward Model and Generative Model.
Closed 12 days ago2
New Gemma-7b DPO Model
Closed a month ago12
Model Test Application
Closed a month ago3
Experiment with human vs gpt4 data
Updated a month ago1
Clean up / enhance DPO code
Closed a month ago1
Add Nvidia RMs (and Nemo compatibility)
Closed a month ago1
Visualization requests
Closed a month ago1
Add New reward models
Closed a month ago2
possibly a typo in `load_bon_dataset.py`
Closed a month ago
'model_modifier' referenced before assignment in enclosing scope
Closed a month ago1
rewardbench.py results are different for different batch size for beaver-7b
Closed 2 months ago43
multi gpu inference with run_rm.py
Closed a month ago3
Dataset v2 discussion & feedback
Updated 2 months ago3
Prompt Repeated in DPO `tokenize_row` (not actually sure if this is an issue)
Closed 2 months ago3
Set up OpenRouter for llm-as-a-judge
Closed 2 months ago1
Do we need to add system prompt when training/evaluating RM?
Closed 2 months ago1
Add generative models to pip install (probably with optional dependencies)
Closed 3 months ago
`pad_token_id` issue
Closed 3 months ago7
Add `rewardbench` on pypi + basic release management
Closed 3 months ago
Add PoLL for generative RM
Closed 3 months ago
Clarification Needed on DPO Reward Evaluation
Closed 3 months ago4
New LLaMA-3 Seq. Classfier Model
Closed 3 months ago6
Output leaderboard scores when running `run_rm.py`
Closed 3 months ago
adding kto as a separate category
Closed 3 months ago4
[Core team] Migrate Prior Sets to 50% weight
Closed 4 months ago1
Experiment request: DPO with different betas
Closed 4 months ago1
Is eval set on huggingface the eval set or train set?
Closed 4 months ago1
[Model Request] mightbe/Better-PairRM
Closed 4 months ago2
Saving bug (non breaking)
Closed 4 months ago
Multiple styles of computing reward with DPO
Closed 4 months ago1
Generative RM
Closed 4 months ago
Check EOS token on FastChat models
Closed 4 months ago1
Rename Starling 34B
Closed 4 months ago
adding Archangel models (dpo, kto, sft+dpo, sft+kto)
Closed 4 months ago
stanfordnlp/SteamSHP-flan-t5 performance on SHP and HH-RLHF Helpful
Closed 4 months ago1
Check beaver cost model
Closed 4 months ago1
Add a new mistral RM model
Closed 4 months ago1
Add new model weqweasdas/RM-Mistral-7B
Closed 4 months ago
Add new model Mistral-7B-instruct-Unified-Feedback
Closed 4 months ago
Check Qwen model
Closed 5 months ago1
Support Nous Mixtral
Closed 5 months ago1
Set default chat template to None
Closed 5 months ago1
Pref Sets updates
Closed 5 months ago1
Truncation of long sequences
Closed 5 months ago1
Fix score saving PairRM and SteamSHP
Closed 5 months ago2
Best of N benchmark
Updated 5 months ago2
Improve per-token reward tool
Closed 6 months ago
Save reward scores for each prompt
Closed 6 months ago
DATASET TRACKING
Closed 7 months ago1