Wenbosi

Wenbosi

Geek Repo

Github PK Tool:Github PK Tool

Wenbosi's starred repositories

Language:PythonStargazers:169Issues:0Issues:0

CriticBench

[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Language:PythonLicense:MITStargazers:20Issues:0Issues:0

Themis

The official repository for our NLG evaluation LLM Themis and the paper Themis: Towards Flexible and Interpretable NLG Evaluation.

Language:PythonLicense:Apache-2.0Stargazers:14Issues:0Issues:0

gpo

The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".

Language:PythonLicense:Apache-2.0Stargazers:13Issues:0Issues:0

geval

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

Language:PythonLicense:MITStargazers:248Issues:0Issues:0

Topical-Chat

A dataset containing human-human knowledge-grounded open-domain conversations.

Language:PythonStargazers:626Issues:0Issues:0

WildBench

Benchmarking LLMs with Challenging Tasks from Real Users

Language:PythonLicense:Apache-2.0Stargazers:189Issues:0Issues:0

arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:588Issues:0Issues:0

FBI

FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists

Language:PythonStargazers:20Issues:0Issues:0
Language:PythonLicense:MITStargazers:51Issues:0Issues:0

sep

Code release for "Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models" https://arxiv.org/abs/2402.03659

Language:PythonStargazers:94Issues:0Issues:0
Language:PythonStargazers:98Issues:0Issues:0

ToMBench

ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.

Language:PythonLicense:MITStargazers:31Issues:0Issues:0

autosearch-grammarly-premium-cookie

免费白嫖使用Grammarly Premium高级版

Language:PythonLicense:Apache-2.0Stargazers:75Issues:0Issues:0

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonLicense:MITStargazers:1706Issues:0Issues:0

Directional-Stimulus-Prompting

[NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"

Language:PythonLicense:Apache-2.0Stargazers:98Issues:0Issues:0

self-refine

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

Language:PythonLicense:Apache-2.0Stargazers:597Issues:0Issues:0

SelFee

Official codebase for "SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation"

Language:PythonLicense:Apache-2.0Stargazers:220Issues:0Issues:0

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks

Language:Jupyter NotebookLicense:MITStargazers:52Issues:0Issues:0

FollowBench

Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"

Language:PythonLicense:Apache-2.0Stargazers:84Issues:0Issues:0

CELLO

Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)

Language:PythonStargazers:44Issues:0Issues:0

THU-ACP

This is the repository for my Advanced Computing courses at the Computer Science and Technology department of Tsinghua University.

Language:HTMLLicense:CC-BY-4.0Stargazers:18Issues:0Issues:0

Shepherd

This is the repo for the paper Shepherd -- A Critic for Language Model Generation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:210Issues:0Issues:0

MetaRanking

Official code repo for our work "Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement".

License:MITStargazers:2Issues:0Issues:0

CUT

Source code of "Reasons to Reject? Aligning Language Models with Judgments"

Language:PythonLicense:Apache-2.0Stargazers:56Issues:0Issues:0

DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Language:PythonLicense:MITStargazers:807Issues:0Issues:0

MetaCritique

Evaluate the Quality of Critique

Language:PythonLicense:Apache-2.0Stargazers:35Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:63Issues:0Issues:0

LLM-Safeguard

Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"

Language:PythonStargazers:67Issues:0Issues:0

ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

Language:PythonLicense:MITStargazers:1621Issues:0Issues:0