Awesome-LLM-related-Papers-Comprehensive-Topics

We provide awesome papers and repos on very comprehensive topics as follows.

CoT / VLM / Quantization / Grounding / Text2IMG&VID / Prompt Engineering / Reasoning / Robot / Agent / Planning / Reinforcement-Learning / Feedback / In-Context-Learning / InstructionTuning / PEFT / RLHF / RAG / Embodied / VQA / Hallucination / Diffusion / Scaling / Context-Window / WorldModel / Memory / Zero-Shot / RoPE / Speech / Perception / Survey / Segmentation / Learge Action Model / Foundation / RoPE / LoRA

We strongly recommend checking our Notion table for interactive experience.

Number of papers and repos in total: 443

Category	Title	Links	Date
3D, GPT4, VLM	GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation	ArXiv
3D, Open-source, Perception, Robot	3D-LLM: Injecting the 3D World into Large Language Models	ArXiv	2023/07/24
AGI, Agent	OpenAGI: When LLM Meets Domain Experts
AGI, Awesome Repo, Survey	Awesome-LLM-Papers-Toward-AGI	GitHub
AGI, Brain	When Brain-inspired AI Meets AGI
AGI, Brain	Divergences between Language Models and Human Brains
AGI, Survey	Levels of AGI: Operationalizing Progress on the Path to AGI
APIs, Agent, Tool	Gorilla: Large Language Model Connected with Massive APIs	ArXiv
Action-Generation, Generation, Prompting	Prompt a Robot to Walk with Large Language Models
Action-Model, Agent, LAM	LaVague	GitHub
Agent	LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem	ArXiv
Agent	AIOS: LLM Agent Operating System	ArXiv
Agent	Cognitive Architectures for Language Agents	ArXiv
Agent	PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Agent	AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Agent	ScreenAgent: A Vision Language Model-driven Computer Control Agent
Agent	swarms	GitHub
Agent	Agents: An Open-source Framework for Autonomous Language Agents
Agent	MindAgent: Emergent Gaming Interaction
Agent	InfiAgent: A Multi-Tool Agent for AI Operating Systems
Agent	Predictive Minds: LLMs As Atypical Active Inference Agents
Agent	XAgent: An Autonomous Agent for Complex Task Solving
Agent	LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
Agent	AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors	ArXiv
Agent	Agents: An Open-source Framework for Autonomous Language Agents	ArXiv, GitHub
Agent	AutoAgents: A Framework for Automatic Agent Generation	GitHub
Agent	DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines	ArXiv
Agent	AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Agent	CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society
Agent	XAgent: An Autonomous Agent for Complex Task Solving	ArXiv
Agent	Generative Agents: Interactive Simulacra of Human Behavior	ArXiv
Agent	LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	ArXiv	2023/04/22
Agent	AgentSims: An Open-Source Sandbox for Large Language Model Evaluation	ArXiv	2023/08/08
Agent, Awesome Repo	Awesome LLM-Powered Agent	GitHub
Agent, Awesome Repo	LLM Agents Papers	GitHub
Agent, Awesome Repo	Awesome Large Multimodal Agents	GitHub
Agent, Awesome Repo	Awesome-Papers-Autonomous-Agent	GitHub
Agent, Awesome Repo	Autonomous Agents	GitHub
Agent, Awesome Repo	Awesome AI Agents	GitHub
Agent, Awesome Repo, Embodied, Grounding	XLang Paper Reading	GitHub
Agent, Awesome Repo, LLM	CoALA: Awesome Language Agents	GitHub
Agent, Awesome Repo, LLM	Awesome-Embodied-Agent-with-LLMs	GitHub
Agent, Blog	LLM Powered Autonomous Agents	ArXiv
Agent, Code-LLM	TaskWeaver: A Code-First Agent Framework
Agent, Code-LLM, Code-as-Policies, Survey	If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents	ArXiv
Agent, Code-as-Policies	Executable Code Actions Elicit Better LLM Agents	ArXiv	2024/01/24
Agent, Embodied	Embodied Task Planning with Large Language Models
Agent, Embodied	Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Agent, Embodied	Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld	ArXiv
Agent, Embodied	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Agent, Embodied	OpenAgents: An Open Platform for Language Agents in the Wild	ArXiv, GitHub
Agent, Embodied, Robot	OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
Agent, Embodied, Robot	AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents	ArXiv
Agent, Embodied, Survey	Application of Pretrained Large Language Models in Embodied Artificial Intelligence	ArXiv
Agent, End2End, Game, Robot	An Interactive Agent Foundation Model	ArXiv
Agent, Feedback, Reinforcement-Learning	AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback	ArXiv	2023/09/29
Agent, Feedback, Reinforcement-Learning, Robot	Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models	ArXiv	2023/11/04
Agent, GPT4, Web	GPT-4V(ision) is a Generalist Web Agent, if Grounded
Agent, GUI	SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Agent, GUI	ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model	GitHub
Agent, GUI	CogAgent: A Visual Language Model for GUI Agents
Agent, GUI, MobileApp	You Only Look at Screens: Multimodal Chain-of-Action Agents
Agent, GUI, MobileApp	Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Agent, GUI, MobileApp	AppAgent: Multimodal Agents as Smartphone Users
Agent, GUI, Web	"What’s important here?": Opportunities and Challenges of Using LLMs in Retrieving Informatio from Web Interfaces
Agent, Game	LEARNING EMBODIED VISION-LANGUAGE PRO- GRAMMING FROM INSTRUCTION, EXPLORATION, AND ENVIRONMENTAL FEEDBACK
Agent, Instruction-Turning	AgentTuning: Enabling Generalized Agent Abilities For LLMs	ArXiv
Agent, LLM, Planning	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Agent, Memory, Minecraft	JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models	ArXiv	2023/11/10
Agent, Memory, RAG	RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents	ArXiv	2024/02/06
Agent, Minecraft	Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory01
Agent, Minecraft	S-Agents: Self-organizing Agents in Open-ended Environment
Agent, Minecraft	Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Agent, Minecraft	LARP: Language-Agent Role Play for Open-World Games
Agent, Minecraft	Voyager: An Open-Ended Embodied Agent with Large Language Models	ArXiv	2023/05/25
Agent, Minecraft	Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents	ArXiv	2023/02/03
Agent, Minecraft, Reinforcement-Learning	RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds
Agent, MobileApp	You Only Look at Screens: Multimodal Chain-of-Action Agents	GitHub
Agent, Multi	War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars	ArXiv
Agent, Multimodal, Robot	A Generalist Agent	ArXiv	2022/05/12
Agent, Reasoning	AGENT INSTRUCTS LARGE LANGUAGE MODELS TO BE GENERAL ZERO-SHOT REASONERS
Agent, Reasoning	Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Agent, Reasoning, Zero-shot	Agent Instructs Large Language Models to be General Zero-Shot Reasoners	ArXiv	2023/10/05
Agent, Reinforcement-Learning	STARLING: SELF-SUPERVISED TRAINING OF TEXTBASED REINFORCEMENT LEARNING AGENT WITH LARGE LANGUAGE MODELS
Agent, Reinforcement-Learning	Language Instructed Reinforcement Learning for Human-AI Coordination	ArXiv	2023/04/13
Agent, Reinforcement-Learning	Eureka: Human-Level Reward Design via Coding Large Language Models	ArXiv	2023/10/19
Agent, Reinforcement-Learning	Guiding Pretraining in Reinforcement Learning with Large Language Models	ArXiv	2023/02/13
Agent, Reinforcement-Learning	Language to Rewards for Robotic Skill Synthesis	ArXiv	2023/06/14
Agent, Reinforcement-Learning, Reward	EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL	ArXiv	2022/06/20
Agent, Reinforcement-Learning, Reward	Reward Design with Language Models	ArXiv	2023/02/27
Agent, Reinforcement-Learning, Reward	Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning	ArXiv	2023/09/20
Agent, Soft-Dev	Communicative Agents for Software Development	GitHub
Agent, Soft-Dev	MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Agent, Survey	Large Multimodal Agents: A Survey
Agent, Survey	Agent AI: Surveying the Horizons of Multimodal Interaction
Agent, Survey	A Survey on LLM-based Autonomous Agents	GitHub
Agent, Survey	The Rise and Potential of Large Language Model Based Agents: A Survey	ArXiv	2023/09/14
Agent, Survey	A Survey on Large Language Model based Autonomous Agents	ArXiv	2023/08/22
Agent, Tool	ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Agent, Tool	Gorilla: Large Language Model Connected with Massive APIs
Agent, Video-for-Agent	Video as the New Language for Real-World Decision Making
Agent, Web	OS-Copilot: Towards Generalist Computer Agents with Self-Improvement	ArXiv
Agent, Web	OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web	ArXiv
Agent, Web	WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Agent, Web	WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Agent-Project, Code-LLM	open-interpreter	GitHub
Anything, CLIP, Perception	SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Anything, Caption, Perception, Segmentation	Segment and Caption Anything	ArXiv
Anything, Depth	Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Anything, Perception, Segmentation	Segment Anything	ArXiv
Audio	Robust Speech Recognition via Large-Scale Weak Supervision
Audio2Video, Diffusion, Generation, Video	EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Automate, Chain-of-Thought, Reasoning	Automatic Chain of Thought Prompting in Large Language Models	ArXiv	2022/10/07
Automate, Prompting	Large Language Models Are Human-Level Prompt Engineers	ArXiv	2022/11/03
Awesome Repo, Chain-of-Thought	Chain-of-ThoughtsPapers	GitHub
Awesome Repo, Chinese	Awesome-Chinese-LLM	GitHub
Awesome Repo, Compress	Awesome LLM Compression	GitHub
Awesome Repo, Diffusion	Awesome-Diffusion-Models	GitHub
Awesome Repo, Embodied	Awesome Embodied Vision	GitHub
Awesome Repo, Hallucination, Survey	A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions	ArXiv, GitHub
Awesome Repo, IROS, Robot	IROS2023PaperList	GitHub
Awesome Repo, In-Context-Learning	Paper List for In-context Learning	GitHub
Awesome Repo, Japanese, LLM	日本語LLMまとめ	GitHub
Awesome Repo, Korean	awesome-korean-llm	GitHub
Awesome Repo, LLM	Awesome-LLM	GitHub
Awesome Repo, LLM, Leaderboard	LLM-Leaderboard	GitHub
Awesome Repo, LLM, Robot	Everything-LLMs-And-Robotics	GitHub
Awesome Repo, LLM, Survey	Awesome-LLM-Survey	GitHub
Awesome Repo, LLM, VLM	Multimodal & Large Language Models	GitHub
Awesome Repo, LLM, Vision	LLM-in-Vision	GitHub
Awesome Repo, Multimodal	Awesome-Multimodal-LLM	GitHub
Awesome Repo, Multimodal	Awesome-Multimodal-Large-Language-Models	GitHub
Awesome Repo, Package	Awesome LLMOps	GitHub
Awesome Repo, Perception, VLM	Awesome Vision-Language Navigation	GitHub
Awesome Repo, RLHF, Reinforcement-Learning	Awesome RLHF (RL with Human Feedback)	GitHub
Awesome Repo, Reasoning	Awesome-Reasoning-Foundation-Models	GitHub
Awesome Repo, Reasoning	Awesome LLM Reasoning	GitHub
Awesome Repo, Robot	Awesome-LLM-Robotics	GitHub
Awesome Repo, Survey	LLMSurvey	GitHub
Benchmark, GPT4	Sparks of Artificial General Intelligence: Early experiments with GPT-4
Benchmark, In-Context-Learning	PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change	ArXiv	2022/06/21
Benchmark, In-Context-Learning	ARB: Advanced Reasoning Benchmark for Large Language Models	ArXiv	2023/07/25
Benchmark, Sora, Text-to-Video	LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models01
Brain	A Neuro-Mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization
Brain	LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Brain, Conscious	Could a Large Language Model be Conscious?
Brain, Conscious	Could a Large Language Model be Conscious?
Brain, Instruction-Turning	Instruction-tuning Aligns LLMs to the Human Brain
CRAG, RAG	Corrective Retrieval Augmented Generation	ArXiv
Caption, VLM, VQA	Caption Anything: Interactive Image Description with Diverse Multimodal Controls	ArXiv	2023/05/04
Chain-of-Thought, Code-as-Policies	Chain of Code: Reasoning with a Language Model-Augmented Code Emulator	ArXiv
Chain-of-Thought, Code-as-Policies, PersonalCitation, Robot	Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought	ArXiv
Chain-of-Thought, Embodied, PersonalCitation, Robot, Task-Decompose	EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought	ArXiv	2023/05/24
Chain-of-Thought, Embodied, Robot	EgoCOT: Embodied Chain-of-Thought Dataset for Vision Language Pre-training
Chain-of-Thought, GPT4, Reasoning, Robot	Look Before You Leap: Unveiling the Power ofGPT-4V in Robotic Vision-Language Planning	ArXiv	2023/11/29
Chain-of-Thought, In-Context-Learning	Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding	ArXiv
Chain-of-Thought, In-Context-Learning	Reasoning with Language Model is Planning with World Model	ArXiv	2023/05/24
Chain-of-Thought, In-Context-Learning	Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models	ArXiv	2023/05/06
Chain-of-Thought, In-Context-Learning	Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations	ArXiv	2022/05/24
Chain-of-Thought, In-Context-Learning	PAL: Program-aided Language Models	ArXiv	2022/11/18
Chain-of-Thought, In-Context-Learning	Self-Refine: Iterative Refinement with Self-Feedback	ArXiv	2023/03/30
Chain-of-Thought, In-Context-Learning	Complexity-Based Prompting for Multi-Step Reasoning	ArXiv	2022/10/03
Chain-of-Thought, In-Context-Learning	Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models	ArXiv	2023/08/20
Chain-of-Thought, In-Context-Learning	Least-to-Most Prompting Enables Complex Reasoning in Large Language Models	ArXiv	2022/05/21
Chain-of-Thought, In-Context-Learning, Self	Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement	ArXiv	2023/05/23
Chain-of-Thought, In-Context-Learning, Self	Measuring and Narrowing the Compositionality Gap in Language Models	ArXiv	2022/10/07
Chain-of-Thought, Planning, Reasoning	SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning	ArXiv	2023/08/01
Chain-of-Thought, Prompting	Chain-of-Thought Reasoning Without Prompting
Chain-of-Thought, Reasoning	Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
Chain-of-Thought, Reasoning	Multimodal Chain-of-Thought Reasoning in Language Models	ArXiv	2023/02/02
Chain-of-Thought, Reasoning	Self-Consistency Improves Chain of Thought Reasoning in Language Models	ArXiv	2022/03/21
Chain-of-Thought, Reasoning	Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding	ArXiv	2023/07/28
Chain-of-Thought, Reasoning	Rethinking with Retrieval: Faithful Large Language Model Inference	ArXiv	2022/12/31
Chain-of-Thought, Reasoning	Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance	ArXiv	2023/05/26
Chain-of-Thought, Reasoning	Tree of Thoughts: Deliberate Problem Solving with Large Language Models	ArXiv	2023/05/17
Chain-of-Thought, Reasoning	Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework	ArXiv	2023/05/05
Chain-of-Thought, Reasoning	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	ArXiv	2022/01/28
Chain-of-Thought, Reasoning, Survey	Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters	ArXiv	2023/12/20
Chain-of-Thought, Reasoning, Survey	A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future	ArXiv	2023/09/27
Chain-of-Thought, Reasoning, Table	Chain-of-table: Evolving tables in the reasoning chain for table understanding
Code-LLM	StarCoder 2 and The Stack v2: The Next Generation
Code-LLM, Front-End	Design2Code: How Far Are We From Automating Front-End Engineering?
Code-as-Policies, Embodied, PersonalCitation, Reasoning, Robot, Task-Decompose	Inner Monologue: Embodied Reasoning through Planning with Language Models	ArXiv
Code-as-Policies, Embodied, PersonalCitation, Robot	Code as Policies: Language Model Programs for Embodied Control	ArXiv	2022/09/16
Code-as-Policies, Multimodal, OpenGVLab, PersonalCitation, Robot	Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model	ArXiv	2023/05/18
Code-as-Policies, PersonalCitation, Robot	ChatGPT for Robotics: Design Principles and Model Abilities
Code-as-Policies, PersonalCitation, Robot	RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Code-as-Policies, PersonalCitation, Robot	RoboCodeX:Multi-modal Code Generation forRobotic Behavior Synthesis	ArXiv
Code-as-Policies, PersonalCitation, Robot	ProgPrompt: Generating Situated Robot Task Plans using Large Language Models	ArXiv	2022/09/22
Code-as-Policies, PersonalCitation, Robot, State-Manage	Statler: State-Maintaining Language Models for Embodied Reasoning	ArXiv	2023/06/30
Code-as-Policies, PersonalCitation, Robot, Zero-shot	Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language	ArXiv	2022/04/01
Code-as-Policies, Reasoning	Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Code-as-Policies, Reasoning, VLM, VQA	ViperGPT: Visual Inference via Python Execution for Reasoning	ArXiv	2023/03/14
Code-as-Policies, Reinforcement-Learning, Reward	Code as Reward: Empowering Reinforcement Learning with VLMs	ArXiv
Code-as-Policies, Robot	Creative Robot Tool Use with Large Language Models
Code-as-Policies, Robot	RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Code-as-Policies, Robot	Executable Code Actions Elicit Better LLM Agents
Code-as-Policies, Robot	SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models	ArXiv	2023/09/18
Code-as-Policies, VLM, VQA	Visual Programming: Compositional visual reasoning without training	ArXiv	2022/11/18
Compress, Prompting	Learning to Compress Prompts with Gist Tokens	ArXiv
Compress, Quantization, Survey	A Survey on Model Compression for Large Language Models	ArXiv
Compress, Scaling	(Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression	ArXiv
Context-Window	RoFormer: Enhanced Transformer with Rotary Position Embedding
Context-Window, Foundation	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Context-Window, Foundation, Gemini, LLM, Scaling	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Context-Window, LLM, RoPE, Scaling	LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens	ArXiv
Context-Window, Reasoning, RoPE, Scaling	Resonance RoPE: Improving Context Length Generalization of Large Language Models
Context-Window, Scaling	LONGNET: Scaling Transformers to 1,000,000,000 Tokens	ArXiv	2023/07/01
Context-Window, Scaling	Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Data-generation, Robot	RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation	ArXiv	2023/11/02
Data-generation, Robot	GenSim: Generating Robotic Simulation Tasks via Large Language Models	ArXiv	2023/10/02
Datatset, Instruction-Turning	Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Datatset, Instruction-Turning	REVO-LION: EVALUATING AND REFINING VISION LANGUAGE INSTRUCTION TUNING DATASETS
Datatset, LLM, Survey	A Survey on Data Selection for Language Models
Demonstration, GPT4, PersonalCitation, Robot	GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Diffusion	A latent text-to-image diffusion model
Diffusion, Robot	3D Diffusion Policy	ArXiv
Diffusion, Speech	NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Diffusion, Survey	On the Design Fundamentals of Diffusion Models: A Survey	ArXiv
Diffusion, Text-to-Image	Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Distilling	Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes01
Distilling, Survey	A Survey on Knowledge Distillation of Large Language Models
Drive, Survey	A Survey on Multimodal Large Language Models for Autonomous Driving	ArXiv
Driving, Spacial	GPT-Driver: Learning to Drive with GPT	ArXiv	2023/10/02
Embodied, LLM, Robot, Survey	The Development of LLMs for Embodied Navigation	ArXiv	2023/11/01
Embodied, Reasoning, Robot	Natural Language as Polices: Reasoning for Coordinate-Level Embodied Control with LLMs	ArXiv, GitHub	2024/03/20
Embodied, Robot	Large Language Models as Generalizable Policies for Embodied Tasks
Embodied, Robot, Task-Decompose	Embodied Task Planning with Large Language Models	ArXiv	2023/07/04
Embodied, World-model	Language Models Meet World Models: Embodied Experiences Enhance Language Models
Enbodied	Embodied Question Answering	ArXiv
End2End, Multimodal, Robot	VIMA: General Robot Manipulation with Multimodal Prompts	ArXiv	2022/10/06
End2End, Multimodal, Robot	PaLM-E: An Embodied Multimodal Language Model	ArXiv	2023/03/06
End2End, Multimodal, Robot	Physically Grounded Vision-Language Models for Robotic Manipulation	ArXiv	2023/09/05
Evaluation, LLM, Survey	A Survey on Evaluation of Large Language Models	ArXiv
Feedback, In-Context-Learning, Robot	InCoRo: In-Context Learning for Robotics Control with Feedback Loops
Feedback, Robot	Correcting Robot Plans with Natural Language Feedback	ArXiv
Feedback, Robot	Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Feedback, Robot	REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction	ArXiv	2023/06/27
Foundation, LLM, Open-source	Code Llama: Open Foundation Models for Code
Foundation, LLM, Open-source	LLaMA: Open and Efficient Foundation Language Models	ArXiv	2023/02/27
Foundation, LLaMA, Vision	VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Foundation, Robot, Survey	Foundation Models in Robotics: Applications, Challenges, and the Future	ArXiv	2023/12/13
GPT4, Gemini, LLM	Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases	ArXiv	2023/12/22
GPT4, Instruction-Turning	INSTRUCTION TUNING WITH GPT-4	ArXiv
GPT4, LLM	GPT-4 Technical Report	ArXiv	2023/03/15
Generation, Robot, Zero-shot	Towards Generalizable Zero-Shot Manipulationvia Translating Human Interaction Plans
Generation, Robot, Zero-shot	Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models	ArXiv
Generation, Survey	Advances in 3D Generation: A Survey
Grounding	GLaMM: Pixel Grounding Large Multimodal Model
Grounding	V-IRL: Grounding Virtual Intelligence in Real Life
Grounding, Reasoning	Visually Grounded Reasoning across Languages and Cultures
Grounding, Reinforcement-Learning	Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Gym, PPO, Reinforcement-Learning, Survey	Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym
Hallucination, Survey	Combating Misinformation in the Age of LLMs: Opportunities and Challenges	ArXiv
Image, LLaMA, Perception	LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
In-Context-Learning	Can large language models explore in-context?
In-Context-Learning	What does CLIP know about a red circle? Visual prompt engineering for VLMs
In-Context-Learning	ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate	ArXiv	2023/08/14
In-Context-Learning	ReAct: Synergizing Reasoning and Acting in Language Models	ArXiv	2023/03/20
In-Context-Learning	Generative Agents: Interactive Simulacra of Human Behavior	ArXiv	2023/04/07
In-Context-Learning	Small Models are Valuable Plug-ins for Large Language Models	ArXiv	2023/05/15
In-Context-Learning	Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models	ArXiv	2022/06/09
In-Context-Learning, Instruction-Turning	In-Context Instruction Learning
In-Context-Learning, Perception, Vision	Visual In-Context Prompting
In-Context-Learning, Prompt-Tuning	Visual Prompt Tuning
In-Context-Learning, Reinforcement-Learning	AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
In-Context-Learning, Scaling	Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale		2022/03/06
In-Context-Learning, Scaling	Structured Prompting: Scaling In-Context Learning to 1,000 Examples		2020/03/06
In-Context-Learning, Survey	A Survey on In-context Learning	ArXiv
In-Context-Learning, VQA	VisualCOMET: Reasoning about the Dynamic Context of a Still Image	ArXiv	2020/04/22
In-Context-Learning, VQA	SINC: Self-Supervised In-Context Learning for Vision-Language Tasks	ArXiv	2023/07/15
In-Context-Learning, Video	Prompting Visual-Language Models for Efficient Video Understanding
In-Context-Learning, Vision	Visual Prompting via Image Inpainting
In-Context-Learning, Vision	What Makes Good Examples for Visual In-Context Learning?
Instruction-Turning	Tuna: Instruction Tuning using Feedback from Large Language Models	ArXiv	2023/03/06
Instruction-Turning	Exploring the Benefits of Training Expert Language Models over Instruction Tuning	ArXiv	2023/02/06
Instruction-Turning	Exploring Format Consistency for Instruction Tuning
Instruction-Turning	A Closer Look at the Limitations of Instruction Tuning
Instruction-Turning, LLM	Training language models to follow instructions with human feedback	ArXiv	2022/03/04
Instruction-Turning, LLM	Self-Instruct: Aligning Language Models with Self-Generated Instructions	ArXiv	2022/12/20
Instruction-Turning, LLM	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	ArXiv	2023/04/20
Instruction-Turning, LLM, PEFT	LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention	ArXiv	2023/03/28
Instruction-Turning, LLM, PEFT	Visual Instruction Tuning	ArXiv	2023/04/17
Instruction-Turning, LLM, Survey	Instruction Tuning for Large Language Models: A Survey
Instruction-Turning, LLM, Zero-shot	Finetuned Language Models Are Zero-Shot Learners	ArXiv	2021/09/03
Instruction-Turning, Self	Self-Instruct: Aligning Language Models with Self-Generated Instructions
Instruction-Turning, Survey	A Survey on Data Selection for LLM Instruction Tuning
Instruction-Turning, Survey	A Closer Look at the Limitations of Instruction Tuning	ArXiv
Instruction-Turning, Survey	Vision-Language Instruction Tuning: A Review and Analysis
Instruction-Turning, Survey	Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning
Intaractive, OpenGVLab, VLM	InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language	ArXiv	2023/05/09
LLM	Language Models are Few-Shot Learners	ArXiv	2020/05/28
LLM, Memory	MemoryBank: Enhancing Large Language Models with Long-Term Memory	ArXiv	2023/05/17
LLM, Open-source	A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.	GitHub
LLM, Open-source	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	ArXiv	2023/05/11
LLM, Open-source	ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst	ArXiv	2023/05/25
LLM, Open-source	OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models	ArXiv	2023/08/02
LLM, Open-source, Perception, Segmentation	Segment Anything	ArXiv	2023/04/05
LLM, PersonalCitation, Robot	Tree-Planner: Efficient Close-loop Task Planning with Large Language Models01
LLM, PersonalCitation, Robot, Zero-shot	Language Models as Zero-Shot Trajectory Generators	ArXiv
LLM, Quantization	The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits	ArXiv
LLM, Reasoning, Survey	Towards Reasoning in Large Language Models: A Survey	ArXiv	2022/12/20
LLM, Robot, Survey	Large Language Models for Robotics: A Survey
LLM, Robot, Task-Decompose	Do As I Can, Not As I Say: Grounding Language in Robotic Affordances	ArXiv	2022/04/04
LLM, Scaling	BitNet: Scaling 1-bit Transformers for Large Language Models	ArXiv
LLM, Spacial	Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning	ArXiv	2023/10/05
LLM, Survey	A Survey of Large Language Models	ArXiv	2023/03/31
LLM, Temporal Logics	NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models	ArXiv	2023/05/12
LLM, Zero-shot	GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?	ArXiv	2023/11/27
LLaMA, Lightweight, Open-source	MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
LLaVA, VLM	TinyLLaVA: A Framework of Small-scale Large Multimodal Models	ArXiv
Lab	Imperial College London - Zeroshot trajectory
Lab	OpenGVLab	GitHub
Lab	Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University - CogVLM
Lab	Rutgers University, AGI Research - OpenAGI
Lab	XLANG NLP Lab - OpenAgents
Lab	OpenBMB - ChatDev, XAgent, AgentVerse
Lab	Reworkd AI - AgentGPT
Lab	DeepWisdom - MetaGPT
Lab	Tencent AI Lab - AppAgent, WebVoyager
LoRA, Scaling	Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements	ArXiv
LoRA, Scaling	LoRA: Low-Rank Adaptation of Large Language Models
Low-level-action, Robot	SayTap: Language to Quadrupedal Locomotion	ArXiv	2023/06/13
Low-level-action, Robot	Prompt a Robot to Walk with Large Language Models	ArXiv	2023/09/18
Math, Reasoning	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Memory, Reinforcement-Learning	Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Memory, Robot	LLM as A Robotic Brain: Unifying Egocentric Memory and Control	ArXiv	2023/04/19
MoE	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	ArXiv
Multimodal, Robot	Flamingo: a Visual Language Model for Few-Shot Learning	ArXiv	2022/04/29
Multimodal, Robot	Open-World Object Manipulation using Pre-trained Vision-Language Models	ArXiv	2023/03/02
Multimodal, Robot	MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation	ArXiv	2023/08/07
Natural-Language-as-Polices, Robot	RT-H: Action Hierarchies Using Language	ArXiv
Navigation, Reasoning, Vision	NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Open-source	Gemma: Introducing new state-of-the-art open models	ArXiv
Open-source, Perception	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Open-source, VLM	OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models	ArXiv	2023/08/02
PPO, RLHF, Reinforcement-Learning	Secrets of RLHF in Large Language Models Part I: PPO	ArXiv	2024/02/01
Package	Alpaca-LoRA	GitHub
Package	Dify	GitHub
Package	h2oGPT	GitHub
Package	LangChain	GitHub
Package	LlamaIndex	GitHub
Perception	SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Perception	Simple Open-Vocabulary Object Detection with Vision Transformers	ArXiv
Perception	Recognize Anything: A Strong Image Tagging Model	ArXiv
Perception	DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Perception	Grounded Language-Image Pre-training	ArXiv	2021/12/07
Perception	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	ArXiv	2023/03/09
Perception	PointCLIP: Point Cloud Understanding by CLIP	ArXiv	2021/12/04
Perception	Simple Open-Vocabulary Object Detection with Vision Transformers	ArXiv	2022/05/12
Perception, Reasoning	Lenna: Language Enhanced Reasoning Detection Assistant	ArXiv
Perception, Reasoning	DetGPT: Detect What You Need via Reasoning	ArXiv
Perception, Reasoning, Robot	Reasoning Grasping via Multimodal Large Language Model	ArXiv
Perception, Robot	Language Segment-Anything
Perception, Robot	LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding	ArXiv	2023/12/21
Perception, Task-Decompose	DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment	ArXiv	2023/07/01
Perception, Video, Vision	CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval	ArXiv
PersonalCitation, Robot	Text2Motion: From Natural Language Instructions to Feasible Plans	ArXiv
Prompting	Contrastive Chain-of-Thought Prompting
Prompting, Survey	A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
Quantization, Scaling	SliceGPT: Compress Large Language Models by Deleting Rows and Columns
RAG	Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity	ArXiv
RAG	RAFT: Adapting Language Model to Domain Specific RAG	ArXiv
RAG	RAG-Fusion: a New Take on Retrieval-Augmented Generation	ArXiv
RAG	Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
RAG	Training Language Models with Memory Augmentation
RAG, Survey	Retrieval-Augmented Generation for Large Language Models: A Survey
RAG, Survey	Retrieval-Augmented Generation for Large Language
RAG, Survey	Large Language Models for Information Retrieval: A Survey
RAG, Temporal Logics	FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation	ArXiv
RLHF	Secrets of RLHF in Large Language Models Part II: Reward Modeling
RLHF, Reinforcement-Learning, Survey	A Survey of Reinforcement Learning from Human Feedback
Reasoning	The Impact of Reasoning Step Length on Large Language Models
Reasoning	STaR: Bootstrapping Reasoning With Reasoning	ArXiv	2022/05/28
Reasoning	Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Reasoning	Rephrase and Respond(RaR)
Reasoning	Contrastive Chain-of-Thought Prompting
Reasoning	Chain-of-Thought Reasoning Without Prompting	ArXiv
Reasoning	Self-Discover: Large Language Models Self-Compose Reasoning Structures
Reasoning	Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning	ArXiv
Reasoning	ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs.
Reasoning, Reinforcement-Learning	ReFT: Reasoning with Reinforced Fine-Tuning
Reasoning, Robot	AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Reasoning, Survey	Reasoning with Language Model Prompting: A Survey	ArXiv
Reasoning, Symbolic	Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning
Reasoning, Table	Large Language Models are few(1)-shot Table Reasoners	ArXiv
Reasoning, VLM, VQA	MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action	ArXiv	2023/03/20
Reasoning, Zero-shot	Large Language Models are Zero-Shot Reasoners
Reinforcement-Learning	Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Reinforcement-Learning	RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents
Resource	[Resource] arxiv-sanity	ArXiv
Resource	[Resource] AlphaSignal	ArXiv
Resource	[Resource] Semanticscholar	ArXiv
Resource	[Resource] Connectedpapers	ArXiv
Resource	[Resource] dailyarxiv	ArXiv
Resource	[Resource] huggingface	ArXiv
Resource	[Resource] Paperswithcode	ArXiv
RoPE	RoFormer: Enhanced Transformer with Rotary Position Embedding	ArXiv
Robot	DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies	ArXiv
Robot	OCI-Robotics: Object-Centric Instruction Augmentation for Robotic Manipulation	ArXiv
Robot	PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs	ArXiv
Robot	Introspective Tips: Large Language Model for In-Context Decision Making	ArXiv
Robot	RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Robot	Generative Expressive Robot Behaviors using Large Language Models
Robot	OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Robot	RoCo: Dialectic Multi-Robot Collaboration with Large Language Models	ArXiv
Robot	Interactive Language: Talking to Robots in Real Time
Robot	Reflexion: Language Agents with Verbal Reinforcement Learning	ArXiv	2023/03/20
Robot, Survey	Real-World Robot Applications of Foundation Models: A Review
Robot, Survey	Language-conditioned Learning for Robotic Manipulation: A Survey	ArXiv	2023/12/17
Robot, Survey	Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis	ArXiv	2023/12/14
Robot, Survey	Robot Learning in the Era of Foundation Models: A Survey	ArXiv	2023/11/24
Robot, Task-Decompose	SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning	ArXiv	2023/07/12
Robot, Task-Decompose, Zero-shot	Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents	ArXiv	2022/01/18
Robot, Zero-shot	Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Robot, Zero-shot	Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting
Robot, Zero-shot	Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Robot, Zero-shot	BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning	ArXiv
Sora, Text-to-Video	Mora: Enabling Generalist Video Generation via A Multi-Agent Framework	ArXiv
Sora, Text-to-Video	Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Survey	Efficient Large Language Models: A Survey	ArXiv, GitHub
Survey, TimeSeries	Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Survey, Training	Understanding LLMs: A Comprehensive Overview from Training to Inference
Survey, VLM	MM-LLMs: Recent Advances in MultiModal Large Language Models
Survey, Video	Video Understanding with Large Language Models: A Survey
Temporal	Explorative Inbetweening of Time and Space	ArXiv
Tex2Img	Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation	ArXiv
Text-to-Image, World-model	World Model on Million-Length Video And Language With RingAttention
VLM	ScreenAI: A Vision-Language Model for UI and Infographics Understanding
VLM	PaLM: Scaling Language Modeling with Pathways	ArXiv	2022/04/05
VLM, VQA	DeepSeek-VL: Towards Real-World Vision-Language Understanding01
VLM, VQA	CogVLM: Visual Expert for Pretrained Language Models	ArXiv	2023/11/06
VLM, VQA	Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models	ArXiv	2023/04/19
VLM, World-model	Large World Model	ArXiv
ViFM, Video	InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding	ArXiv, GitHub
World-model	Learning to Model the World with Language	ArXiv
World-model	Diffusion World Model	ArXiv
World-model	Learning to Model the World with Language	ArXiv
World-model	Language Models Meet World Models	ArXiv
World-model	Learning and Leveraging World Models in Visual Representation Learning
World-model	Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Zero-shot	Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

shure-dev / Awesome-LLM-related-Papers-Comprehensive-Topics

Awesome-LLM-related-Papers-Comprehensive-Topics

About