SpatialTracker: Tracking Any 2D Pixels in 3D Space,
Yuxi Xiao*, Qianqian Wang*, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou,
CVPR 2024, Highlight Paper at arxiv
- Release SpatialTracker-v2 (in coming).
- Release HuggingFace/Gradio demo.
- Release SpatialTracker inference code and checkpoints (approximated at late April).
-
05.04.2024
: SpatialTracker is selected as Highlight Paper! -
26.02.2024
: SpatialTracker is accepted at CVPR 2024!
The inference code was tested on
- Ubuntu 20.04
- Python 3.10
- PyTorch 2.1.1
- PyTorch Lightning 2.2.1
- 1 NVIDIA GPU (RTX A6000) with CUDA version 11.8. (Other GPUs are also suitable, and 22GB GPU memory is sufficient for dense tracking (~10k points) with our code.)
conda create -n SpaTrack python==3.10
conda activate SpaTrack
pip install flash-attn --no-build-isolation
or install from source codes of flash attention
pip install -r requirements.txt
In our default setting, monocular depth estimator is needed to acquire the metric depths from video input. There are several models for options (ZoeDepth, Metric3D, UniDepth and DepthAnything).
We take ZoeDepth as default model. Download dpt_beit_large_384.pt
, ZoeD_M12_K.pt
, ZoeD_M12_NK.pt
into models/monoD/zoeDepth/ckpts
.
Our method supports RGB
or RGBD
videos input.
Firstly, please make sure that you have installed blender. We provide the visualization code for blender:
/Applications/Blender.app/Contents/MacOS/Blender -P create.py -- --input ${OUTPUT}.npy
For example, the butterfly
looked like