SuperCLEVR Physics

A dynamical 3D scene understanding dataset for Video Question Answering. The scenes are annotated with objects' (1) static properties (shape, color) and (2) 3D dynamical properties (3D position, velocities, external forces), and (3) physical properties (mass, frictions, restitution); and Collision Event (objects involved, frame).

(Note: the color space is compressed for visualization)

Related works

SuperCLEVR. Visual question answering (VQA) dataset for domain robustness in four factors: visual complexity, question redundancy, concept distribution, concept compositionality.
SuperCLEVR-3D. A VQA dataset for 3D awareness scene understanding the objects from images including 3D poses, parts, and occlusions.

Video Question Answering

We design questions about the dynamical properties under 4D space of objects and their collision events.

There are types of questions: factual question, predictive question and counterfactual question from the generated scenes.

How to generate your own data

1. Environment

Setup Environment

Python version

We use python version 3.10. The python version will affect the compatibility of bpy packages.

Install Dependencies

Please use the following steps to install packages. Our project is built upon Kubric. We modified the original package to control more dynamical properties.

pip install -r requirements.txt

Install bpy

This is the python package for blender software, which is able to be installed from pip now. (PyPI, official site)

pip install bpy==3.5

If 3.5 is not applicable, 3.4 should also be compatible to this repo.

2. Video rendering

Run bash run.sh directly for new scene creation and video rendering.

Example of generating 100 videos.

time="$(date +%Y-%m-%d_%H-%M-%S)"
for num in {0..100}
do 
    CUDA_VISIBLE_DEVICES=xx python sim_render_color_defined_load_scene.py \
        --data_dir=assets \
        --job-dir=output/superclevr-physics \
        --scratch_dir=output/tmp/tmp-$time \
        --camera=fixed \
        --height=realistic \
        --iteration=$num \
        --scene_size 5 
done

The output folder will be like

output/superclevr-physics
└───super_clevr_0
│   └───events.json
|   └───metadata.json
|   └───rgba_00000.png
|   └───rgba_00001.png
|   └───...
|   └───rgba_00120.png
└───super_clevr_1
│   └───events.json
|   └───metadata.json
|   └───rgba_00000.png
|   └───rgba_00001.png
|   └───...
|   └───rgba_00120.png

Citation

@article{wang2024compositional,
  title={Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering},
  author={Wang, Xingrui and Ma, Wufei and Wang, Angtian and Chen, Shuo and Kortylewski, Adam and Yuille, Alan},
  journal={arXiv preprint arXiv:2406.00622},
  year={2024}
}

XingruiWang / SuperCLEVR-Physics