LLM-Agent-Paper-Digest

For benefiting the research community and promoting LLM-powered agent direction, we organize papers related to LLM-powered agent that published on top conferences recently. Currently, our repository has included:

2023: NIPS

For a more comprehensive collection of papers and in-depth details, please refer to the survey A Survey on Large Language Model based Autonomous Agents and the associated repository LLM-Agent-Survey.

We are glad for pointing out our misunderstandings, and welcome to contribute to this repository!

What's new:

2023/9/26 We add papers from NIPS'23.

Agent Building
Agent Application
Agent Evaluation

Agent Building

Agent Profile

[Agent Profile] CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society. [Paper] [Code]
TLDR: The paper presents CAMEL, a framework that fosters autonomous cooperation between communicative agents. Using a role-playing approach, it employs inception prompting to guide chat agents in tasks, aligning with human intentions. (通过角色扮演提高Agent能力)

Agent Memory

[Agent Memory] Reflexion: language agents with verbal reinforcement learning. [Paper] [Code]
TLDR: Reflexion maintains the feedback signal from the tasks in long-term and short-term memory buffers for reflection to make better decisions on subsequent trials. (利用长短期记忆维护反馈并进行反思)

[Agent Memory] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. [Paper] [Code]
TLDR: This paper introduces a novel agent framework called SWIFTSAGE, which combines a fast and intuitive thinking module, SWIFT, with a deliberate thinking module, SAGE, to optimize action planning in complex interactive reasoning tasks. SWIFT is a fine-tuned small encoder-decoder LM, while SAGE employs LLMs like GPT-4 for subgoal planning and grounding. (结合小模型快速思考和大模型深思熟虑)

[Agent Memory] Large Language Model Is Semi-Parametric Reinforcement Learning Agent. [Paper] [Code]
TLDR: By equipping the LLM with a longterm experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. (为LLM配备长期经验记忆，构建一个半参数化的强化学习agent)

Agent Planning

[Agent Planning] Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents. [Paper] [Code]
TLDR: LLM-powered agent could get better error correction through feedback, while introducing a goal selector to rank and improve planning based on predicted completion steps. (引入了任务选择器，实现了MineCraft中的多任务代理)

[Agent Planning] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning. [Paper]
TLDR: Using Large Language Models (LLMs) as Common Sense World Models and Heuristic Strategies to Solve Complex Task Planning Prob. (利用大型语言模型（LLMs）作为常识世界模型和启发式策略来解决复杂任务规划问题)

[Agent Planning] Tree of Thoughts: Deliberate Problem Solving with Large Language Models. [Paper] [Code]
TLDR: we introduce a new framework for language model inference, "Tree of Thoughts" (ToT), which generalizes over the popular "Chain of Thought" approach to prompting language models, and enables the exploration of tree-like thought about multiple paths to problem solving. (鼓励大模型考虑多个不同的推理路径)

[Agent Planning] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning. [Paper] [Code]
TLDR: we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. (大模型+外部规划器)

[Agent Planning] Large Language Models can Implement Policy Iteration. [Paper]
TLDR: In this work, we present an algorithm, ICPI, that learns to perform RL tasks without expert demonstrations or gradients. Instead we present a policy-iteration method in which the prompt content is the entire locus of learning. ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment. (LLM作为model-base强化学习的world-model和policy)

Agent Action

[Agent Action] GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction. [Paper] [Code]
TLDR: we propose the GPT4Tools based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools. It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts. (用gpt生成工具使用记录，再用LoRA微调开源模型)

[Agent Action] AVIS: Autonomous Visual Information Seeking with Large Language Models. [Paper]
TLDR: AVIS is an autonomous visual information seeking system that leverages a large language model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA. (LLM动态制定使用外部工具的策略，从而获取视觉信息查询问题所需的必要知识)

[Agent Action] HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. [Paper]
TLDR: HuggingGPT is a system designed to autonomously generate plans based on user requests and utilize models from Hugging Face. The workflow consists of task planning, model selection, task execution, and response generation. This design allows HuggingGPT to integrate multimodal perceptual capabilities and manage complex AI tasks. Experiments were conducted using various GPT models to ensure stable outputs, and the system was evaluated across different task types. (LLM生成计划并调用Hugging Face的模型完成任务)

Agent Application

Agent in Social Science

[Social Science] Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning. [Paper]
TLDR: We present a new algorithm for using outputs from LLMs for downstream statistic alanalyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. (用LLM的输出进行社会科学的文档标签的下游统计分析)

Agent in Natural Science

[Natural Science] De novo Drug Design using Reinforcement Learning with Multiple GPT Agents.
TLDR: Awaiting publication.

Agent in Engineering

[Engineering] LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering. [Paper] [Code]
TLDR: This paper introduces a method called CAAFE that harnesses Large Language Models for feature engineering on tabular datasets. CAAFE iteratively generates semantically meaningful features based on dataset descriptions and provides explanations for the created features. This approach has improved performance across multiple datasets. (LLM自动优化特征工程)

[Engineering] SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [Paper] [Code]
TLDR: The paper presents SheetCopilot, an agent using Large Language Models to interact with spreadsheets via natural language. It translates complex requests into actionable steps, outperforming traditional programming methods in various tasks. (Agent和电子表格交互)

[Engineering] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models. [Paper]
TLDR: We propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. (通过LLM给出的复合视觉线索进行零样本视觉关系检测)

[Engineering] 3D-LLM: Injecting the 3D World into Large Language Models. [Paper] [Code]
TLDR: We propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. (将3D世界注入到LLM中)

[Engineering] What’s Left: Concept Grounding with Large Language Models.
TLDR: Awaiting publication.

Agent Evaluation

[Agent Evaluation] Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples. [Paper]
TLDR: To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. (使用OOD示例评估LLM的推理能力)

[Agent Evaluation] Evaluating Cognitive Maps in Large Language Models: No Emergent Planning. [Paper]
TLDR: We propose CogEval, a Cognitive Science-Inspired protocol for Measurement and Evaluation for Large Language Models. Second, we use CogEval to systematically evaluate hypothesized latent abilities, cognitive maps and planning, across a number of LLMs using tasks with established construct validity and absent from LLM training sets. We find that, while LLMs show apparent competence in a few tasks with smaller graphs, evidence suggests against emergent planning capacities as they lack genuine understanding of latent task structures. (提出一种受认知科学启发的协议CogEval用于LLM的评估。)

[Agent Evaluation] On the Planning Abilities of Large Language Models - A Critical Investigation. [Paper] [Code]
TLDR: By developing a benchmark suite based on the International Planning Competition, the study evaluates the performance of LLMs in three modes: autonomous, heuristic, and human-in-the-loop. (评估LLM的Planning能力)

[Agent Evaluation] Large Language Models of Code Fail at Completing Code with Potential Bugs. [Paper]
TLDR: We introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs – anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. (引入了测试Code-LLM的buggy-code completion问题)

Contributors

Xueyang Feng: NIPS'23

Lei Wang: NIPS'23

Chen Ma: NIPS'23

qixucen / LLM-Agent-Paper-Digest