Progressive Region Enhancement Network (PRENet)

Code release for Large Scale Visual Food Recognition

Introduction

Our Progressive Region Enhancement Network (PRENet) mainly consists of progressive local feature learning and region feature enhancement. The former mainly adopts the progressive training strategy to learn complementary multi-scale finer local features, like different ingredient-relevant information. The region feature enhancement uses self-attention to incorporate richer contexts with multiple scales into local features to enhance the local feature representation. Then we fuse enhanced local features and global ones from global feature learning into the unified one via the concat layer.

During training, after progressively training the networks from different stages, we then train the whole network with the concat part, and further introduce the KL-divergence to increase the difference between stages for capturing more detailed features. For the inference, considering the complementary output from each stage and the concatenated features, we combine the prediction results from them for final food classification.

Requirement

python 3.6
PyTorch >= 1.3.1
torchvision >= 0.4.2
PIL
Numpy
dropblock

Data preparation

Download the food datasets. The file structure should look like:

dataset
├── class_001
|      ├── 1.jpg
|      ├── 2.jpg
|      └── ...
├── class_002
|      ├── 1.jpg
|      ├── 2.jpg
|      └── ...
│── ...

Download the training and testing list files, e.g. train_full.txt, test_full.txt

Training

To train a PRENet on food datasets from scratch, run:

python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model>

Inference

Download the pretrained model on Food2k from google/baidu(Code: o0nj)
To evaluate a pre-trained PRENet on food datasets, run:

python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model> --test --use_checkpoint --checkpoint <checkpoint_path>

Other pretrained model on Food2K

CNN	link
vgg16	google/baidu(Code: puuy)
resnet50	google/baidu(Code: 5eay)
resnet101	google/baidu(Code: yv1o)
resnet152	google/baidu(Code: 22zw)
densenet161	google/baidu(Code: bew5)
inception_resnet_v2	google/baidu(Code: xa8r)
senet154	google/baidu(Code: kwzf)

Contact

If you find this repo useful to your project, please consider to cite it with following bib:

@article{min2021large,
  title={Large scale visual food recognition},
  author={Min, Weiqing and Wang, Zhiling and Liu, Yuxin and Luo, Mengjiang and Kang, Liping and Wei, Xiaoming and Wei, Xiaolin and Jiang, Shuqiang},
  journal={arXiv preprint arXiv:2103.16107},
  year={2021}
}

zlszhonglongshen / prenet