StigLidu

SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. SmartPlay is designed to be easy to use, and to support future development of LLMs.

Language:PythonCC-BY-4.011600

Grounding_LLMs_with_online_RL

We perform functional grounding of LLMs' knowledge in BabyAI-Text

Language:PythonMIT21400

llm-reasoners

A library for advanced large language model reasoning

Language:PythonApache-2.0115900

text2reward

[ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"

Language:Jupyter Notebook11400

rl-prompt

Accompanying repo for the RLPrompt paper

Language:PythonMIT29400

grace

[EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning

Language:Python4200

T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Language:PythonApache-2.021400

XAgent

An Autonomous LLM Agent for Complex Task Solving

Language:PythonApache-2.0805800

BIG-Bench-Hard

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

MIT41600

InternLM

Official release of InternLM2.5 base and chat models. 1M context support

Language:PythonApache-2.0628100

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.03655100

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.01355500

evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023

Language:PythonApache-2.0116000

StigLidu

Weihua Du's starred repositories

ScienceWorld

Open-Sora

Agent-FLAN

video-nonlocal-net

mmaction2

kinetics-dataset

VMZ

training_extensions

uoj

Voyager

DeepSeek-Math

lagent

lmdeploy

on-policy

HAZARD

docs

google-research

SmartPlay