wangyu-ustc/LM4CV

This is the official implementation of our ICCV paper Learning Concise and Descriptive Attributes for Visual Recognition.

Requirements

torch == 2.0.1
python 3.9.13
torchvision == 0.15.2

Datasets

CUB: Download the dataset from here. The downloaded files are organized as below.
Stanford_Cars: Download the dataset from here. The downloaded files are organized as below.
CIFAR10: run the code python main.py --config configs/cifar10.yaml then the dataset will be automatically downloaded into the folder ./data/cifar-10-batches-py.
CIFAR100: run the code python main.py --config configs/cifar100_bn.yaml then the datasete will be automatically downloaded into the folder ./data/cifar-100-python.
Flowers102: run the code python main.py --config configs/flower.yaml then the dataset will be automatically downloaded into the folder ./data/flowers-102.
Food101: run the code python main.py --config configs/food_bn.yaml then the datasete will be automatically downloaded into the folder ./data/food-101.
Oxford-Pets: run the code python main.py --config configs/oxford_pets_bn.yaml then the datasete will be automatically downloaded into the folder ./data/oxford-iiit-pet.
Imagenet-Animals: Download t he dataset from here and the downloaded files are organized as below.

- data
    - CUB_200_2011
        - cub_attributes_gpt3.txt # generated by us
        - image_class_labels.txt # generated by us
        - train_test_split.txt
        - images.txt
        - attributes
        - images
        - parts
        - README.md
        - ...
    - stanford_cars
        - cars_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
        - cars_train
            - *.jpg
        - cars_test
            - *.jpg
        - devkit
        - cars_train.tgz
        - cars_test.tgz
        - cars_test_annos_withlabels.mat
        # The url provided from "torchvision" is invalid, 
        # so you need to first download the files and put 
        # the tgz files under this folder so that the class 
        # would think the dataset has already been downloaded.
    - cifar-10-batches-py
        - cifar10_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
    - cifar-100-python
        - cifar100_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
    - flowers-102
        - flower_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
    - food-101
        - food_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
    - oxford-iiit-pet
        - oxford_pets_attributes.txt # generated by us
        - image_class_labels.txt # generated by us
    - imagenet
        - imagenet_animal_attributes.txt # generated by us
        - imagenet_attributes.txt # generated by us
        - image_class_labels.txt # generated by us

Attributes queired for each class

We put the attributes quried for each class with GPT3 in the folder cls2attributes.

Parameters

The following key parameters are available for customization:

cluster_feature_method: Choose one from [kmeans, random, linear]. "Linear" refers to our method.
model_size: Set the size of the CLIP model.
mahalanobis: Enable or disable Mahalanobis distance regularization.
division_power: Control the strength of Mahalanobis constraints.
reinit: Decide whether to initialize the model with weights from image training features.
num_attributes: Specify the number of attributes selected for classification.

Please make sure to adjust these parameters according to your requirements.

Citation

If you find our codebase useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2308-03685,
  author       = {An Yan and
                  Yu Wang and
                  Yiwu Zhong and
                  Chengyu Dong and
                  Zexue He and
                  Yujie Lu and
                  William Wang and
                  Jingbo Shang and
                  Julian J. McAuley},
  title        = {Learning Concise and Descriptive Attributes for Visual Recognition},
  journal      = {CoRR},
  volume       = {abs/2308.03685},
  year         = {2023}
}

wangyu-ustc / LM4CV

Requirements

Datasets

Attributes queired for each class

Parameters

Citation

About

Languages