fanrz / BM-NAS

BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)

Yihang Yin, Siyu Huang, Xiang Zhang

Paper, Poster and Presentation

framework

Please check our arXiv version here for the full paper with supplementary. We also provide our poster in this repo. Our oral presentation video at AAAI-2022 can be found on YouTube, for both the brief introduction and the full presentation.

Requirements

The latest tested versions are:

pytorch==1.10.1
opencv-python==4.5.5.62
sklearn==1.10.1
tqdm
IPython
graphviz (you need excutebles, not only Python API)

Pre-trained Backbones and Pre-processed Datasets

The backbones (checkpoints) and pre-processed datasets (BM-NAS_dataset) are available at here, you can download them and put them in the root directory. For some weird reason, ego checkpoints are blocked by AliCloud, There is an alternative Google Drive link for checkpoints at here.

MM-IMDB Experiments

You can just use our pre-processed dataset, but you should cite the original MM-IMDB dataset.

Dataset Pre-processing

If you want to use the original one, you can follow these steps.

You can download multimodal_imdb.hdf5 from the original repo of MM-IMDB. Then use our pre-processing script to split the dataset.

$ python datasets/prepare_mmimdb.py

Run Experiments

$ python main_darts_searchable_mmimdb.py
$ python main_darts_found_mmimdb.py --search_exp_dir=<dir of search exp>

NTU RGB-D Experiments

You can just use our pre-processed dataset, but you should cite the original NTU RGB-D dataset.

Dataset Pre-processing

If you want to use the original one, you can follow these steps.

First, request and download the NTU RGB+D dataset (not NTU RGB+D 120) from the official site. We only use the 3D skeletons (body joints) and RGB videos modality.

Then, run the following script to reshape all RGB videos to 256x256 with 30 fps:

$ python datasets/prepare_ntu.py --dir=<dir of RGB videos>

Run Experiments

First search the hypernets. You can use --parallel for data-parallel. The default setting will require about 128GB of GPU memeroy, you may adjust the --batchsize according to your budget.

$ python main_darts_searchable_ntu.py --parallel

Then train the searched fusion network. You need to assign the searching experiment by --search_exp_dir.

$ python main_darts_found_ntu.py --search_exp_dir=<dir of search exp>

If you want to just run the test process (no training of the fusion network), you can also use this script, you need to assign both the searching and evaluation experiments directories.

$ python main_darts_found_ntu.py --search_exp_dir=<dir of search exp> --eval_exp_dir=<dir of eval exp>

EgoGesture Experiments

Dataset Pre-processing

Download the EgoGesture dataset from the official site. You only need to download the image data. And unzip the data into this kind of structure.

├── EgoGesture
│   ├── Subject01
│   ├── Subject02

Run Experiments

First search the hypernets. You can use --parallel for data-parallel. You may adjust the --batchsize according to your GPU memory budget.

$ python main_darts_searchable_ego.py --parallel

Then train the searched fusion network. You need to assign the searching experiment by --search_exp_dir.

$ python main_darts_found_ego.py --search_exp_dir=<dir of search exp>

If you want to just run the test process (no training of the fusion network), you can also use this script, you need to assign both the searching and evaluation experiments directories.

$ python main_darts_found_ego.py --search_exp_dir=<dir of search exp> --eval_exp_dir=<dir of eval exp>

Visualization

You can use structure_vis.ipynb to visualize the searched genotypes.

structure_vis_example

Citation

If you find this work helpful, please kindly cite our paper.

@article{yin2021bm,
  title={BM-NAS: Bilevel Multimodal Neural Architecture Search},
  author={Yin, Yihang and Huang, Siyu and Zhang, Xiang},
  journal={arXiv preprint arXiv:2104.09379},
  year={2021}
}

About

BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)


Languages

Language:Python 90.2%Language:Jupyter Notebook 9.8%