-
Follow the requirement installation for [BLIP repository](GitHub - salesforce/BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation).
-
Download checkpoints and put into
ckpts
path. -
Paste code in the repository to the cloned BLIP path.
-
Follow [ScanQA repository](GitHub - ATR-DBI/ScanQA). Download and prepeocess data.
-
Replace the ScanQA data path in the code to yours.
-
Replace the Scannet data path in the
render_scenes.py
to yours. -
Run
render_scenes.py
.
-
Run
eval_scene_best_views.py
to zero-shot evaluate BLIP with ScanQA. -
A result json will be generated, indicating matched views w.r.t. questions.
- Run
train_scene_view_vqa.py
.