Xingyao Wang's repositories
mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Heng Ji.
code4struct
Official repo for ACL 2023 paper Code4Struct: Code Generation for Few-Shot Structured Prediction from Natural Language.
Megatron-LLM
distributed trainer for LLMs
alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Awesome-LLMs-Evaluation-Papers
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
ChatGPT-Next-Web
A well-designed cross-platform ChatGPT UI (Web / PWA / Linux / Win / MacOS). ä¸€é”®ć‹Ąćś‰ä˝ č‡Şĺ·±çš„č·¨ĺąłĺŹ° ChatGPT 应用。
EasyDeL
EasyDeL is an OpenSource Library to make your training faster and more Optimized With cool Options for training and serving Both in Python And Mojo🔥
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
Megatron-LM
Ongoing research training transformer models at scale
ollama
Get up and running with Llama 2, Mistral, Gemma, and other large language models.
potato
potato: portable text annotation tool
sambanova_toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
ToolBench
An open platform for training, serving, and evaluating large language model for tool learning.
transformers
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
trl
Train transformer language models with reinforcement learning.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"