Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery

CVPR 2024

Yuqi Zhang · Guanying Chen · Jiaxing Chen · Shuguang Cui

Project Page

We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D.

Overview

This repository contains the following components to train Aerial Lifting:

Dataset processing scripts, including:
1. far-view semantic label fusion;
2. cross-view instance label grouping.
Training and evaluation scripts.

Note: This is a preliminary release and there may still be some bugs.

Installation

Create new conda env (CUDA)

Clone this repo by:

git clone https://github.com/zyqz97/Aerial_lifting.git

Create a conda environment (installation via anaconda is recommended.
```
conda create -n aeriallift python=3.9
conda activate aeriallift
```

pytorch-version

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

tiny-cuda-nn and others

pip install -r requirements.txt
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

Install the extension of torch-ngp

cd ./gp_nerf/torch_ngp/gridencoder
python setup.py install
cd ../raymarching
python setup.py install
cd ../shencoder
python setup.py install

Follow the official neuralsim to install nr3d_lib.

Install SAM

git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
pip install -e .
cd tools/segment_anything
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Tested environments

Ubuntu 20.04 with torch 1.10.1 & CUDA 11.3 on A100 GPU.

Data Processing & Training Step

We take Yingrenshi dataset as an example. And you need to set 'dataset_path=$YOURPATH/Aerial_lifting_data/Yingrenshi' and 'config_file=configs/yingrenshi.yaml'.
We also provide the processed data in the next section. The training scripts (Step 1.1, Step 2.4, and Step 3.3) can be run directly if you download the processed data.

Step 1. Training Geometry

1.1 Train the geometry field.
```
sh bash/train_geo.sh
```
Note: $exp_name denotes the logs_saving path (e.g. exp_name=logs/train_geo_yingrenshi)

Step 2. Training Semantic Field

2.1 Get Mask2former semantic labels

For generating semantic labels of Mask2former, please use our modified version of Mask2former from here. You need to create a new conda env. This code is largely based on MaskFormer and a modified version of Panapti-Lifting.

After installing the environment of Mask2former:
```
sh bash/2_1_m2f_labels.sh
```
2.2 Render far-view RGB images from the checkpoint of Step 1.
```
sh bash/2_2_get_far_view_images.sh
```
Note: need to specify $M2F_path, $exp_name, $ckpt_path
2.3 Get fusion semantic label.
```
sh bash/2_3_fusion.sh
```
2.4 Train the semantic field.

After processing or downloading the data, you can use the script below to train the semantic field.
```
sh bash/train_semantic.sh
```

3. Training Instance Field

3.1 Generate the SAM instance mask with geo-filter
```
sh bash/3_1_get_sam_mask_depth_filter.sh
```
3.2 Generate the cross-view guidance map
```
sh bash/3_2_cross_view_process.sh
```
3.3 Train the instance field.

After processing or downloading the data, you can use the script below to train the instance field.
```
sh bash/train_instance.sh
```

Processed Dataset & Trained Models.

Download the processed data and trained checkpoints.

We thank the authors for providing the datasets. If you find the datasets useful in your research, please cite the papers that provided the original aerial images:

@inproceedings{UrbanBIS,
title = {UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building Instance Segmentation},
author = {Guoqing Yang and Fuyou Xue and Qi Zhang and Ke Xie and Chi-Wing Fu and Hui Huang},
booktitle = {SIGGRAPH},
year = {2023},
}

@inproceedings{UrbanScene3D,
title={Capturing, Reconstructing, and Simulating: the UrbanScene3D Dataset},
author={Liqiang Lin and Yilin Liu and Yue Hu and Xingguang Yan and Ke Xie and Hui Huang},
booktitle={ECCV},
year={2022},
}

Citation

If you find this work useful for your research and applications, please cite our paper:

@inproceedings{zhang2024aerial,
  title={Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery},
  author={Zhang, Yuqi and Chen, Guanying and Chen, Jiaxing and Cui, Shuguang},
  booktitle={CVPR},
  year={2024}
}

Acknowledgements

Large parts of this codebase are based on existing work in the Mega-NeRF, torch-ngp, neuralsim, Panoptic-Lifting, Contrastive-Lift, SAM, Mask2Former. We thank the authors for releasing their code.

zyqz97 / Aerial_lifting

Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery

Project Page

We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D.

Overview

Installation

Create new conda env (CUDA)

Tested environments

Data Processing & Training Step

Step 1. Training Geometry

1.1 Train the geometry field.

Step 2. Training Semantic Field

2.1 Get Mask2former semantic labels

2.2 Render far-view RGB images from the checkpoint of Step 1.

2.3 Get fusion semantic label.

2.4 Train the semantic field.

3. Training Instance Field

3.1 Generate the SAM instance mask with geo-filter

3.2 Generate the cross-view guidance map

3.3 Train the instance field.

Processed Dataset & Trained Models.

Citation

Acknowledgements

About

Languages

Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery

Project Page We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D.

Overview

Installation

Create new conda env (CUDA)

Tested environments

Data Processing & Training Step

Step 1. Training Geometry

1.1 Train the geometry field.

Step 2. Training Semantic Field

2.1 Get Mask2former semantic labels

2.2 Render far-view RGB images from the checkpoint of Step 1.

2.3 Get fusion semantic label.

2.4 Train the semantic field.

3. Training Instance Field

3.1 Generate the SAM instance mask with geo-filter

3.2 Generate the cross-view guidance map

3.3 Train the instance field.

Processed Dataset & Trained Models.

Citation

Acknowledgements

About

Languages

Project Page

We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D.