abhi1092 / find_fallen_objects

Official implementation of CVPR 2022 paper "Finding Fallen Objects Via Asynchronous Audio-Visual Integration".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finding Fallen Objects

Official implementation of CVPR 2022 paper "Finding Fallen Objects Via Asynchronous Audio-Visual Integration".

Usage

Data

Download the dataset from here, and extract it in the project root.

The dataset sub-directory contains the necessary information of a case to be loaded into our environment. The .wav files within it are the recorded audio of object falling in each case.

The perception sub-directory contains some information helpful for utilizing our environment. Each .json file contains several fields for the case.

  • position ($x, y, z$) stands for the position of the fallen object relative to the initial state of the agent. The $y$-axis represents the vertical direction. $(0, 0, 1)$ is the facing direction of the agent.
  • name the name of the fallen object. The same name represents the exact same object model.
  • category the category of the fallen object. Each category may have multiple different object models.

Prerequisite

The environment is based on TDW. We tested it on version 1.8.29, which you can download TDW_Linux.tar.gz from here.

You should follow this to install NVIDIA and X on your linux server. If you need to run this environment in docker, you need also install nvidia-docker following this.

After downloading TDW_Linux.tar.gz, extract it into the docker directory. The executable TDW should be located at docker/TDW/TDW.x86_64.

tdw environment setup:

conda create -n tdw
conda activate tdw
pip install gym pyastar magnebot==1.3.2 tdw==1.8.29

planner environment setup:

conda create -n planner
conda activate planner
pip install librosa scikit-image pystar2d docker-compose tdw
pip install 'git+https://github.com/facebookresearch/detectron2.git'
cd env/openai_baselines
pip install -e .

Launch the environment

Launch

You can then launch the environment via

conda activate tdw
python interface.py --display=<display> --split=<split> --port=<port>

Validate

You can use the docker/test.py script to validate the installation for either case. Use port 2590 when launching, or you should edit it in the test script.

The environment will output some information in env_log/ after each case.

obs contains following entries:

  • rgb, depth: the RGB or depth image captured by the agent in the current frame
  • camera_matrix: the camera matrix of the captured RGB and depth image
  • agent ($x, y, z, fx, fy, fz$): $(x, y, z)$ denotes the current location of the agent, $(fx, fy, fz)$ denotes the current facing direction of the agent
  • FOV: field-of-view
  • audio: the audio recorded when the object falls down. It's a byte array by padding 1s to the right of the bytes of .wav file.

info contains following entries:

  • scene_info: a dict representing the name of the case
  • status: (of type magnebot.ActionStatus) the result of the last object, e.g. success or collide
  • finish: whether the task has succeeded

Use the following numbers for action

  • 0: move forward
  • 1: turn left
  • 2: turn right
  • 3: move camera up
  • 4: move camera down
  • 5: claim that the target is in view within the threshold distance

If you want to run multiple environments in parralel, e.g. for training, we borrow the code from openai/baselines (slightly modified) so that you can run:

from env.envs import make_vec_envs
envs = make_vec_envs('find_fallen-v0', num_processes, log_dir, device, True, spaces=(observation_space, action_space), port=<port>, displays=<displays>, split='train')
obs, info = envs.reset()
obs, reward, done, info = envs.step([5 for _ in range(num_processes)])

Notes: In this case, if a case is done, the obs and info returned by step will be the initial status of the next case.

It will use port numbers [port, port + num_processes), and use X displays in displays (it should be a list of strings such as [":4", ":5"]). A single X display can be used for multiple instances simultaneously, so the length of displays can be smaller than num_processes.

Baseline

We provide the code of our modular planner in baseline/planner. Run it with (replace :4 :5 with your available X displays). You can download the pretrained modular models here and place them in <project root>/pretrained.

conda activate planner
python baseline/planner/main_planner.py --displays :4 :5

Evaluation

You can evaluate the result (SR, SPL, SNA) by putting script into the env_log folder and run

python eval.py

you can replace "non_distractor" with "distractor"

About

Official implementation of CVPR 2022 paper "Finding Fallen Objects Via Asynchronous Audio-Visual Integration".

License:MIT License


Languages

Language:Python 98.9%Language:Dockerfile 1.1%