Beast code in Giters

Wenbosi's starred repositories

CharacterEval

Language:Python16900

CriticBench

[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Language:PythonMIT2000

Themis

The official repository for our NLG evaluation LLM Themis and the paper Themis: Towards Flexible and Interpretable NLG Evaluation.

Language:PythonApache-2.01400

gpo

The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".

Language:PythonApache-2.01300

geval

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

Language:PythonMIT24800

Topical-Chat

A dataset containing human-human knowledge-grounded open-domain conversations.

Language:Python62600

WildBench

Benchmarking LLMs with Challenging Tasks from Real Users

Language:PythonApache-2.018900

arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Language:Jupyter NotebookApache-2.058800

FBI

FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists

Language:Python2000

ComplexBench

Language:PythonMIT5100

sep

Code release for "Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models" https://arxiv.org/abs/2402.03659

Language:Python9400

sthan-sr-aaai

Language:Python9800

ToMBench

ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.

Language:PythonMIT3100

autosearch-grammarly-premium-cookie

免费白嫖使用Grammarly Premium高级版

Language:PythonApache-2.07500

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonMIT170600

Directional-Stimulus-Prompting

[NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"

Language:PythonApache-2.09800

self-refine

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

Language:PythonApache-2.059700

SelFee

Official codebase for "SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation"

Language:PythonApache-2.022000

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks

Language:Jupyter NotebookMIT5200

FollowBench

Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"

Language:PythonApache-2.08400

CELLO

Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)

Language:Python4400

THU-ACP

This is the repository for my Advanced Computing courses at the Computer Science and Technology department of Tsinghua University.

Language:HTMLCC-BY-4.01800

Shepherd

This is the repo for the paper Shepherd -- A Critic for Language Model Generation

Language:Jupyter NotebookNOASSERTION21000

MetaRanking

Official code repo for our work "Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement".

MIT200

CUT

Source code of "Reasons to Reject? Aligning Language Models with Judgments"

Language:PythonApache-2.05600

DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Language:PythonMIT80700

MetaCritique

Evaluate the Quality of Critique

Language:PythonApache-2.03500

instruction-induction

Language:PythonApache-2.06300

LLM-Safeguard

Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"

Language:Python6700

ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

Language:PythonMIT162100