choiszt / SG_VLM

Official implementation of "Scene Graph Enhanced Embodied Task Planning with Large Language Models".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SG_VLM Logo

SGPlanner

Scene Graph Enhanced Embodied Task Planning

Official implementation of "Scene Graph Enhanced Embodied Task Planning with Large Language Models".

Try out the web demo 🤗 of SG_VLM: Hugging Face Spaces

The repository contains:

  • The dataset used for fine-tuning the model.
  • Code for generating the dataset.
  • Scripts for fine-tuning the model on high-performance GPUs.
  • Inference scripts for real-time task execution.

News

  • [2024.04.12] Training code for SG_VLM has been released. 📌

Overview

SG_VLM utilizes a scene graph approach to understand and plan tasks in dynamic environments better. The model first constructs a comprehensive scene graph from multi-angle images, identifying relationships and attributes of objects within a scene. This structured data then guides the generation of action plans for robotics tasks, improving accuracy and context-awareness.

Setup

Here's a script to set up SG_VLM from scratch.

# Install dependencies
conda create -n sgvlm python=3.10
conda activate sgvlm
git clone https://github.com/yourusername/SG_VLM.git
cd SG_VLM
pip install -r requirements.txt

# Optional: setup for multi-GPU
pip install deepspeed
Troubleshooting installation issues
  1. Ensure your Python version is compatible.
  2. Check network settings if dependencies fail to download.
  3. Verify GPU compatibility and drivers.
# Additional libraries might be required depending on your specific hardware and software setup.

Data Release

dataset.json includes the task planning data used for model training. The format is detailed, providing object attributes, spatial relationships, and task-specific action sequences.

Data Generation Process

The dataset creation is automated as follows:

cd create_dataset
python create_scene_graphs.py
python create_task_instructions.py
python compile_dataset.py

Fine-tuning

Fine-tuning details:

# Assuming CUDA and appropriate GPUs are available
cd finetune
python finetune_model.py

Inference

To execute task planning in real-time:

python run_inference.py --input "path_to_input_image"

Validation and Testing

Validation is crucial to ensure robust model performance:

cd validate
python validate_tasks.py

Contributing

Contributions to SG_VLM are welcome! Please refer to CONTRIBUTING.md for guidelines on how to contribute effectively.

License

SG_VLM is released under the MIT License. See LICENSE for more information.


Feel free to adjust the content to better fit your project specifics, such as adding more details about the dataset, installation procedures, and any dependencies or submodules your project might have.

About

Official implementation of "Scene Graph Enhanced Embodied Task Planning with Large Language Models".


Languages

Language:Python 98.7%Language:Shell 1.3%