Fine Grained Visual Classification

This repository aims to improve future object detection methods making use of scene depth maps.
The model uses ResNet to extract object features based on rgb and depth channels. It also uses
FCNHead to obtain a segmentation map based on these features.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7.1. To install run:

$ pip install -r requirements.txt

Training

Use BlenderRenderer to generate the dataset.

$ train.py [-h] [--dataset_path DATASET_PATH] [--epochs EPOCHS] [--batch_size BATCH_SIZE]
                [--learning_rate LEARNING_RATE] [--use_gpu] [--output_path OUTPUT_PATH] [--checkpoint CHECKPOINT]
                [--resnet] [--pretrained]
$ test.py [-h] [--dataset_path DATASET_PATH] [--batch_size BATCH_SIZE] [--use_gpu] [--checkpoint CHECKPOINT]
               [--resnet]
$ show.py [-h] [--dataset_path DATASET_PATH] [--batch_size BATCH_SIZE] [--use_gpu] [--checkpoint CHECKPOINT]
               [--resnet]

--dataset_path DATASET_PATH: Should point to the dataset generated by BlenderRenderer or a dataset with the same strucuture.
--epochs EPOCHS: Represents the epoch at which the training stops.
--batch_size BATCH_SIZE
--learning_rate LEARNING_RATE
--use_gpu: Use this flag to use gpu.
--output_path OUTPUT_PATH: Represents the output path for the checkpoint.
--checkpoint CHECKPOINT: Checkpoint file to resume training.
--resnet: Use this flag to train without depth maps.
--pretrained: Use this flag to get a pretrained ResNet model.

Results

One epoch training acc: 81%

alexjercan / object-detection

Fine Grained Visual Classification

Requirements

Training

Results

About

Languages