wzcai99 / Awesom-Embodied-Navigation

Paper & Project lists of cutting-edge research on visual navigation and embodied AI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Survey_EmbodiedAI

Paperlist

  • Simple but Effective: CLIP Embeddings for Embodied AI. 2022 CVPR
  • ZSON: Zero-shot Object-Goal Navigation using MultiModal Goal Embeddings. 2022
  • CLIP on Wheels: Zero-shot Object Navigation as Object Localization and Exploration. 2022
  • ViNG: Learning Open-World Navigation with Visual Goals. 2021 ICRA
  • Pre-Trained Language Models for Interactive Decision-Making. 2022
  • R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL
  • BC-Z: Zero-shot Task Generalization with Robotic Imitation Learning. 2021 CoRL
  • Grounding Language with Visual Affordance over Unstructured Data. 2022
  • What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data. 2022
  • LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision and Action. 2022 CoRL
  • Visual Language Maps for Robot Navigation.
  • Do as I Can, Not as I Say: Grounding Language in Robotics Affordances.
  • Open-vocabulary Queryable Scene Representations for Real World Planning
  • Language Models as Zero-shot Planners: Extracting Actionable Knowledge for Embodied Agents.
  • REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments. 2020 CVPR
  • ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
  • SQA3D: SITUATED QUESTION ANSWERING IN 3D SCENES. 2023 ICLR
  • Episodic Transformer for Vision-and-Language Navigation. 2021 ICCV

Pre-Train for Cross-Modal Representation

  • Lxmert: Learning crossmodality encoder representations from transformers. 2019 EMNLP
  • Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. 2019 NIPS
  • Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks.
  • Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
  • Cross-modal Map Learning for Vision and Language Navigation
  • Airbert: In-domain Pretraining for Vision-and-Language Navigation
  • Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

LLM for Embodied AI

  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
  • LEBP — Language Expectation & Binding Policy: A Two-Stream Framework for Embodied Vision-and-Language Interaction Task Learning Agents
  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

ImageGoal Navigation

  • Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, 2017 ICRA
  • Semi-Parametric Topological Memory for Navigation, 2018 ICLR
  • Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks, 2019 CVPR
  • Neural Topological SLAM for Visual Navigation, 2020 CVPR
  • Visual Graph Memory with Unsupervised Representation for Visual Navigation, 2021 ICCV
  • No RL, No Simulation: Learning to Navigate without Navigating, 2021 NIPS
  • Topological Semantic Graph Memory for Image-Goal Navigation, 2022 CoRL
  • Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation, 2022 CVPR
  • Memory-Augmented Reinforcement Learning for Image-Goal Navigation, 2022 IROS
  • Last-Mile Embodied Visual Navigation, 2022 CoRL
  • ViNG: Learning Open-World Navigation with Visual Goals, 2021 ICRA
  • Lifelong Topological Visual Navigation, 2022 RA-L

Multi-Modal Manipulation

  • Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models. 2022 NIPS workshop
  • Scaling Robot Learning with Semantically Imagined Experience. 2023 \
  • Learning Universal Policies via Text-Guided Video Generation, 2023
  • Policy Adaptation from Foundation Model Feedback, 2023 CVPR
  • CLIPort: What and Where Pathways for Robotic Manipulation, 2021 CoRL
  • RT-1: Robotics Transformer for Real-World Control at Scale. 2022
  • Open-World Object Manipulation using Pre-trained Vision-Language Models. 2023
  • R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL

About

Paper & Project lists of cutting-edge research on visual navigation and embodied AI.