deep-learning ai classification computer-vision cvt imagenet cifar100

CvT Tensorflow Implementation

Dear friends: We implemented the Convolutions to Vision Transformers (CvT) into Tensorflow Version > 2.5. The goal was to better understand the concept and architecture. Please feel free to use and improve the model.
CvT original code GitHub: [https://github.com/microsoft/CvT]
Paper: CvT: Introducing Convolutions to Vision Transformers

Our Implementation Schema

Testing implementation

Trained on CIFAR100
Data set: 60000 images and 100 object categories
Training set: Contains 50000 images (500 objects per a category)
Validation set: Contains 10000 images (100 objects per a category)

Results

CIFAR100 was trained from scratch.
We use augmentation with an image resizing and get state-of-the-art results.

Model	Resolution	Param	Top-1	Hardware
CvT-1	72x72	3.5M	59.0	2x RTX 2080

Please see option details in config.py

Options	Stage 1	Stage 2	Stage 3	Remark
Model
NUM_STAGES	-	-	-	3
CLS_TOKEN	-	-	-	TRUE
EMBEDDING
PATCH_SIZE	6	3	3
PATCH_STRIDE	3	2	2
DIM_EMBED	32	64	128
STAGE
DEPTH	1	2	6	No dropout
ATTENTION
NUM_HEADS	1	3	6

Usage

Installation

Before installing the dependencies you should consider using a virtual environment. It can be created by:

# activate the environment by running the generated activate
# script in <folder name> for your os. E.g. for windows activate.bat
python3 -m venv <folder name>

The necessary packages are listed in requirements.txt. They can be installed using:

pip install -r requirements.txt

For the installation of the optional CUDA drivers please refer to the tensorflow documentation.

Configuration

The Model can be configured with the hyper-parameters in config/config.py.

Training

To start the training without changing Datasets, Learning Rate or the Learning Rate Schedule just run main.py:

python main.py

If you want to change these values, open main.py with an editor and change the parameters of the train function at the bottom of the file.

model, figure = train(cifar_loader,
                      epochs=300,
                      batch_size=512,
                      start_weights="",
                      learning_rate=1e-3,
                      learning_rate_schedule=schedule)

Training Parameters:

cifar_loader

The loader of the Dataset (Consult dataloader/DataLoader.py) for more information.
epchos

The Number of Epochs to train for.
batch_size

The Number of Images per batch.
start_weights

The file name in the weights folder containing pre trained weights to load before starting the training.
learning_rate

The learning rate.
learning_rate_schedule

The learning rate schedule (e. g. a cosine decay)

Note that the training can be stopped at any time by focusing on the plot and holding the key 'q'.

Pressing 'h' or 'r' while focusing on the plot will resize it to fit the Data.

Testing

To test your Model call the train function found in main.py

figure = test(model, cifar_loader, number_of_images=5000, split="test", seed=None)

Test Parameters

model

your trained Model.
cifar_loader

Dataset Loader same as in train.
number_of_images

Determines how many images to use for the test.
split

"test" or "train" the Dataset split to take images from. (usually test : )
seed

The Random Seed by which to choose images. If the Value is None os.urandom is used instead.

About

Convolutional visions Transformer - implemented for Tensorflow

deep-learning ai classification computer-vision cvt imagenet cifar100

Languages

Language:Python 100.0%