Wenbosi's starred repositories
CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Topical-Chat
A dataset containing human-human knowledge-grounded open-domain conversations.
arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
autosearch-grammarly-premium-cookie
免费白嫖使用Grammarly Premium高级版
Directional-Stimulus-Prompting
[NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"
self-refine
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
FollowBench
Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"
MetaRanking
Official code repo for our work "Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement".
DeepSeek-Math
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
MetaCritique
Evaluate the Quality of Critique
LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"